网站开发项目说明书,wordpress立即发布,深圳网页设计科技有限公司,红圈工程项目管理软件RDMA编程基础 存储大师班 | RDMA简介与编程基础 -https://zhuanlan.zhihu.com/p/387549948
1. RDMA的学习环境搭建 RDMA需要专门的RDMA网卡或者InfiniBand卡才能使用#xff0c;学习RDMA而又没有这些硬件设备#xff0c;可以使用一个软件RDMA模拟环境#xff0c;softiwarp …RDMA编程基础 存储大师班 | RDMA简介与编程基础 -https://zhuanlan.zhihu.com/p/387549948
1. RDMA的学习环境搭建 RDMA需要专门的RDMA网卡或者InfiniBand卡才能使用学习RDMA而又没有这些硬件设备可以使用一个软件RDMA模拟环境softiwarp - 这是加载地址https://github.com/zrlio/softiwarp - 这是安装教程http://www.reflectionsofthevoid.com/2011/03/how-to-install-soft-iwarp-on-ubuntu.html
更多的rdmacm实例 - https://github.com/tarickb/the-geek-in-the-corner 需要注意的是这个例子里面缺省用的是IPv6连接如果希望在IPv4环境下测试需要先改代码用IPv4地址。
2. RDMA与socket的类比 和Socket连接类似RDMA连接也分为可靠连接和不可靠连接。然而也不完全相同Socket的可靠连接就是TCP连接是流式的不可靠连接也就是UDP是消息式的。对于RDMA来说无论是可靠连接和不可靠连接都是消息式的。 编程角度看RDMA代码也分为Server端Client端也有bind, listen, connect, accept,等动作然而细节上仍有不少区别。
大家可以关注一下mellonx的vma貌似可以直接用socket api通信方便很多 【RDMA】降低CPU除了RDMA (vbers还是VMA |使用socket进行RDMA编程_bandaoyu的note-CSDN博客 前言看介绍像是mellonx针对其kernel bypass网卡RDMA网卡提供的一个lib库该lib库对外提供socket api使得用户的程序不需要修改就可以直接使用kernel bypass网卡如RDMA网卡。我们都知道RDMA 网卡目前使用的是rdma_cm和vbers api编程和socket不一样如果能用socket对RDMA编程那确实是很大的利好。官网介绍什么是VMAMellanox Interconnect Community官方介绍M https://blog.csdn.net/bandaoyu/article/details/120726746
rdma_cm API说明
https://linux.die.net/man/3/rdma_create_id 推荐
https://www.ibm.com/docs/en/aix/7.2?topicoperations-rdma-listen 内容少 rdma_cm API 管理连接建立连接和销毁vbers api 管理收发
RDMA主机使用queue pairsQP进行通信主机创建由发送队列SQ和接收队列RQ组成的QP并使用verbs API将操作post 到这些队列。所以rdma_cm是管理连接的收发还是verbs API。 3. RDMA服务器的代码流程 main { channelrdma_create_event_channel 这一步是创建一个event channelevent channel是RDMA设备在操作完成后或者有连接请求等事件发生时用来通知应用程序的通道。其内部就是一个file descriptor, 因此可以进行poll等操作。
rdma_create_id(channel, **id……) 这一步创建一个rdma_cm_id, 概念上等价与socket编程时的listen socket。
rdma_bind_addr(id,addr) 和socket编程一样也要先绑定一个本地的地址和端口以进行listen操作。
rdma_listen(id,block) 开始侦听客户端的连接请求
rdma_get_cm_eventchannelevent 这个调用就是作用在第一步创建的event channel上面要从event channel中获取一个事件。这是个阻塞调用只有有事件时才会返回。在一切正常的情况下函数返回时会得到一个 RDMA_CM_EVENT_CONNECT_REQUEST事件也就是说有客户端发起连接了。 在事件的参数里面会有一个新的rdma_cm_id传入。这点和socket是不同的socket只有在accept后才有新的socket fd创建。
on_event()
{ on_connect_request()//RDMA_CM_EVENT_CONNECT_REQUEST { build_context() {
6.ibv_alloc_pd
创建一个protection domain。protection domain可以看作是一个内存保护单位在内存区域和队列直接建立一个关联关系防止未授权的访问。
7.ibv_create_comp_channel
和之前创建的event channel类似这也是一个event channel但只用来报告【完成队列】里面的事件。当【完成队列】里有新的任务完成时就通过这个channel向应用程序报告。
8.ibv_create_cq
创建【完成队列】创建时就指定使用第6步的channel。
}//--end build_context()
9.rdma_create_qp 创建一个queue pair, 一个queue pair包括一个发送queue和一个接收queue. 指定使用前面创建的cq作为完成队列。该qp创建时就指定关联到第6步创建的pd上。
10.ibv_reg_mr 注册内存区域。RDMA使用的内存必须事先进行注册。这个是可以理解的DMA的内存在边界对齐能否被swap等方面都有要求。
11.rdma_accept 至此做好了全部的准备工作可以调用accept接受客户端的这个请求了。 –长出一口气 ~~ 且慢
}
//--end on_connect_request()
12.rdma_ack_cm_event 对于每个从event channel得到的事件都要调用ack函数否则会产生内存泄漏。这一步的ack是对应第5步的get。每一次get调用都要有对应的ack调用。
13.rdma_get_cm_event 继续调用rdma_get_cm_event, 一切正常的话我们此时应该得到 RDMA_CM_EVENT_ESTABLISHED 事件表示连接已经建立起来。不需要做额外的处理直接rdma_ack_cm_event就行了
}//--end on_event()
终于可以开始进行数据传输了 (如何传输下篇再说)
参考http://10.165.104.246:8080/#/c/43882/
4. 关闭连接
断开连接 当rdma_get_cm_event返回RDMA_CM_EVENT_DISCONNECTED事件时表示客户端断开了连接server端要进行对应的清理。此时可以调用rdma_ack_cm_event释放事件资源。然后依次调用下面的函数释放连接资源内存资源队列资源。
rdma_disconnect
rdma_destroy_qp
ibv_dereg_mr
rdma_destroy_id 释放同客户端连接的rdma_cm_id
rdma_destroy_id 释放用于侦听的rdma_cm_id
rdma_destroy_event_channel 释放 event channel
} // end main
实例 源码地址- https://github.com/tarickb/the-geek-in-the-corner
用法 [rootlocalhost 01_basic-client-server]# ./server listening on port 42956.
client server-address server-port
Makefile .PHONY: clean
CFLAGS : -Wall -g LDLIBS : ${LDLIBS} -lrdmacm -libverbs -lpthread
APPS : server client
all: ${APPS} clean: rm -f ${APPS}
注意makefile 没有-L 指定lib的路径所以 -lrdmacm -libverbs -lpthread 对应的库 librdmacm.so libibverbs.so libpthread.so 应放在默认的路径下/usr/lib 或/usr/lib64 服务端server.c
#include stdio.h
#include stdlib.h
#include string.h
#include unistd.h
#include rdma/rdma_cma.h#define TEST_NZ(x) do { if ( (x)) die(error: #x failed (returned non-zero). ); } while (0)
#define TEST_Z(x) do { if (!(x)) die(error: #x failed (returned zero/null).); } while (0)const int BUFFER_SIZE 1024;struct context {struct ibv_context *ctx;struct ibv_pd *pd;struct ibv_cq *cq;struct ibv_comp_channel *comp_channel;pthread_t cq_poller_thread;
};struct connection {struct ibv_qp *qp;struct ibv_mr *recv_mr;struct ibv_mr *send_mr;char *recv_region;char *send_region;
};static void die(const char *reason);static void build_context(struct ibv_context *verbs);
static void build_qp_attr(struct ibv_qp_init_attr *qp_attr);
static void * poll_cq(void *);
static void post_receives(struct connection *conn);
static void register_memory(struct connection *conn);static void on_completion(struct ibv_wc *wc);
static int on_connect_request(struct rdma_cm_id *id);
static int on_connection(void *context);
static int on_disconnect(struct rdma_cm_id *id);
static int on_event(struct rdma_cm_event *event);static struct context *s_ctx NULL;int main(int argc, char **argv)
{
#if _USE_IPV6struct sockaddr_in6 addr;
#elsestruct sockaddr_in addr;
#endifstruct rdma_cm_event *event NULL;struct rdma_cm_id *listener NULL;struct rdma_event_channel *ec NULL;uint16_t port 0;memset(addr, 0, sizeof(addr));
#if _USE_IPV6addr.sin6_family AF_INET6;
#elseaddr.sin_family AF_INET;
#endifTEST_Z(ec rdma_create_event_channel());TEST_NZ(rdma_create_id(ec, listener, NULL, RDMA_PS_TCP));TEST_NZ(rdma_bind_addr(listener, (struct sockaddr *)addr));TEST_NZ(rdma_listen(listener, 10)); /* backlog10 is arbitrary */port ntohs(rdma_get_src_port(listener)); //rdma_get_src_port 返回listener对应的tcp 端口printf(listening on port %d.\n, port);while (rdma_get_cm_event(ec, event) 0) {struct rdma_cm_event event_copy;memcpy(event_copy, event, sizeof(*event));rdma_ack_cm_event(event);if (on_event(event_copy))break;}rdma_destroy_id(listener);rdma_destroy_event_channel(ec);return 0;
}void die(const char *reason)
{fprintf(stderr, %s\n, reason);exit(EXIT_FAILURE);
}void build_context(struct ibv_context *verbs)
{if (s_ctx) {if (s_ctx-ctx ! verbs)die(cannot handle events in more than one context.);return;}s_ctx (struct context *)malloc(sizeof(struct context));s_ctx-ctx verbs;TEST_Z(s_ctx-pd ibv_alloc_pd(s_ctx-ctx));TEST_Z(s_ctx-comp_channel ibv_create_comp_channel(s_ctx-ctx));TEST_Z(s_ctx-cq ibv_create_cq(s_ctx-ctx, 10, NULL, s_ctx-comp_channel, 0)); /* cqe10 is arbitrary */TEST_NZ(ibv_req_notify_cq(s_ctx-cq, 0)); #完成完成队列与完成通道的关联TEST_NZ(pthread_create(s_ctx-cq_poller_thread, NULL, poll_cq, NULL));
}void build_qp_attr(struct ibv_qp_init_attr *qp_attr)
{memset(qp_attr, 0, sizeof(*qp_attr));qp_attr-send_cq s_ctx-cq;qp_attr-recv_cq s_ctx-cq;qp_attr-qp_type IBV_QPT_RC;qp_attr-cap.max_send_wr 10;qp_attr-cap.max_recv_wr 10;qp_attr-cap.max_send_sge 1;qp_attr-cap.max_recv_sge 1;
}void * poll_cq(void *ctx)
{struct ibv_cq *cq;struct ibv_wc wc;while (1) {TEST_NZ(ibv_get_cq_event(s_ctx-comp_channel, cq, ctx));ibv_ack_cq_events(cq, 1);TEST_NZ(ibv_req_notify_cq(cq, 0));while (ibv_poll_cq(cq, 1, wc))on_completion(wc);}return NULL;
}void post_receives(struct connection *conn)
{struct ibv_recv_wr wr, *bad_wr NULL;struct ibv_sge sge;wr.wr_id (uintptr_t)conn;wr.next NULL;wr.sg_list sge;wr.num_sge 1;sge.addr (uintptr_t)conn-recv_region;sge.length BUFFER_SIZE;sge.lkey conn-recv_mr-lkey;TEST_NZ(ibv_post_recv(conn-qp, wr, bad_wr));
}void register_memory(struct connection *conn)
{conn-send_region malloc(BUFFER_SIZE);conn-recv_region malloc(BUFFER_SIZE);TEST_Z(conn-send_mr ibv_reg_mr(s_ctx-pd,conn-send_region,BUFFER_SIZE,0));TEST_Z(conn-recv_mr ibv_reg_mr(s_ctx-pd,conn-recv_region,BUFFER_SIZE,IBV_ACCESS_LOCAL_WRITE));
}void on_completion(struct ibv_wc *wc)
{if (wc-status ! IBV_WC_SUCCESS)die(on_completion: status is not IBV_WC_SUCCESS.);if (wc-opcode IBV_WC_RECV) {struct connection *conn (struct connection *)(uintptr_t)wc-wr_id;printf(received message: %s\n, conn-recv_region);} else if (wc-opcode IBV_WC_SEND) {printf(send completed successfully.\n);}
}int on_connect_request(struct rdma_cm_id *id)
{struct ibv_qp_init_attr qp_attr;struct rdma_conn_param cm_params;struct connection *conn;printf(received connection request.\n);build_context(id-verbs);build_qp_attr(qp_attr);TEST_NZ(rdma_create_qp(id, s_ctx-pd, qp_attr));id-context conn (struct connection *)malloc(sizeof(struct connection));conn-qp id-qp;register_memory(conn);post_receives(conn);memset(cm_params, 0, sizeof(cm_params));TEST_NZ(rdma_accept(id, cm_params));return 0;
}int on_connection(void *context)
{struct connection *conn (struct connection *)context;struct ibv_send_wr wr, *bad_wr NULL;struct ibv_sge sge;snprintf(conn-send_region, BUFFER_SIZE, message from passive/server side with pid %d, getpid());printf(connected. posting send...\n);memset(wr, 0, sizeof(wr));wr.opcode IBV_WR_SEND;wr.sg_list sge;wr.num_sge 1;wr.send_flags IBV_SEND_SIGNALED;sge.addr (uintptr_t)conn-send_region;sge.length BUFFER_SIZE;sge.lkey conn-send_mr-lkey;TEST_NZ(ibv_post_send(conn-qp, wr, bad_wr));return 0;
}int on_disconnect(struct rdma_cm_id *id)
{struct connection *conn (struct connection *)id-context;printf(peer disconnected.\n);rdma_destroy_qp(id);ibv_dereg_mr(conn-send_mr);ibv_dereg_mr(conn-recv_mr);free(conn-send_region);free(conn-recv_region);free(conn);rdma_destroy_id(id);return 0;
}int on_event(struct rdma_cm_event *event)
{int r 0;if (event-event RDMA_CM_EVENT_CONNECT_REQUEST)r on_connect_request(event-id);else if (event-event RDMA_CM_EVENT_ESTABLISHED)r on_connection(event-id-context);else if (event-event RDMA_CM_EVENT_DISCONNECTED)r on_disconnect(event-id);elsedie(on_event: unknown event.);return r;
} 客户端client.c
#include netdb.h
#include stdio.h
#include stdlib.h
#include string.h
#include unistd.h
#include rdma/rdma_cma.h#define TEST_NZ(x) do { if ( (x)) die(error: #x failed (returned non-zero). ); } while (0)
#define TEST_Z(x) do { if (!(x)) die(error: #x failed (returned zero/null).); } while (0)const int BUFFER_SIZE 1024;
const int TIMEOUT_IN_MS 500; /* ms */struct context {struct ibv_context *ctx;struct ibv_pd *pd;struct ibv_cq *cq;struct ibv_comp_channel *comp_channel;pthread_t cq_poller_thread;
};struct connection {struct rdma_cm_id *id;struct ibv_qp *qp;struct ibv_mr *recv_mr;struct ibv_mr *send_mr;char *recv_region;char *send_region;int num_completions;
};static void die(const char *reason);static void build_context(struct ibv_context *verbs);
static void build_qp_attr(struct ibv_qp_init_attr *qp_attr);
static void * poll_cq(void *);
static void post_receives(struct connection *conn);
static void register_memory(struct connection *conn);static int on_addr_resolved(struct rdma_cm_id *id);
static void on_completion(struct ibv_wc *wc);
static int on_connection(void *context);
static int on_disconnect(struct rdma_cm_id *id);
static int on_event(struct rdma_cm_event *event);
static int on_route_resolved(struct rdma_cm_id *id);static struct context *s_ctx NULL;int main(int argc, char **argv)
{struct addrinfo *addr;struct rdma_cm_event *event NULL;struct rdma_cm_id *conn NULL;struct rdma_event_channel *ec NULL;if (argc ! 3)die(usage: client server-address server-port);TEST_NZ(getaddrinfo(argv[1], argv[2], NULL, addr));TEST_Z(ec rdma_create_event_channel());TEST_NZ(rdma_create_id(ec, conn, NULL, RDMA_PS_TCP));TEST_NZ(rdma_resolve_addr(conn, NULL, addr-ai_addr, TIMEOUT_IN_MS));freeaddrinfo(addr);while (rdma_get_cm_event(ec, event) 0) {struct rdma_cm_event event_copy;memcpy(event_copy, event, sizeof(*event));rdma_ack_cm_event(event);if (on_event(event_copy))break;}rdma_destroy_event_channel(ec);return 0;
}void die(const char *reason)
{fprintf(stderr, %s\n, reason);exit(EXIT_FAILURE);
}void build_context(struct ibv_context *verbs)
{if (s_ctx) {if (s_ctx-ctx ! verbs)die(cannot handle events in more than one context.);return;}s_ctx (struct context *)malloc(sizeof(struct context));s_ctx-ctx verbs;TEST_Z(s_ctx-pd ibv_alloc_pd(s_ctx-ctx));TEST_Z(s_ctx-comp_channel ibv_create_comp_channel(s_ctx-ctx));TEST_Z(s_ctx-cq ibv_create_cq(s_ctx-ctx, 10, NULL, s_ctx-comp_channel, 0)); /* cqe10 is arbitrary */TEST_NZ(ibv_req_notify_cq(s_ctx-cq, 0));TEST_NZ(pthread_create(s_ctx-cq_poller_thread, NULL, poll_cq, NULL));
}void build_qp_attr(struct ibv_qp_init_attr *qp_attr)
{memset(qp_attr, 0, sizeof(*qp_attr));qp_attr-send_cq s_ctx-cq;qp_attr-recv_cq s_ctx-cq;qp_attr-qp_type IBV_QPT_RC;qp_attr-cap.max_send_wr 10;qp_attr-cap.max_recv_wr 10;qp_attr-cap.max_send_sge 1;qp_attr-cap.max_recv_sge 1;
}void * poll_cq(void *ctx)
{struct ibv_cq *cq;struct ibv_wc wc;while (1) {TEST_NZ(ibv_get_cq_event(s_ctx-comp_channel, cq, ctx));ibv_ack_cq_events(cq, 1);TEST_NZ(ibv_req_notify_cq(cq, 0));while (ibv_poll_cq(cq, 1, wc))on_completion(wc);}return NULL;
}void post_receives(struct connection *conn)
{struct ibv_recv_wr wr, *bad_wr NULL;struct ibv_sge sge;wr.wr_id (uintptr_t)conn;wr.next NULL;wr.sg_list sge;wr.num_sge 1;sge.addr (uintptr_t)conn-recv_region;sge.length BUFFER_SIZE;sge.lkey conn-recv_mr-lkey;TEST_NZ(ibv_post_recv(conn-qp, wr, bad_wr));
}void register_memory(struct connection *conn)
{conn-send_region malloc(BUFFER_SIZE);conn-recv_region malloc(BUFFER_SIZE);TEST_Z(conn-send_mr ibv_reg_mr(s_ctx-pd, conn-send_region, BUFFER_SIZE, 0));TEST_Z(conn-recv_mr ibv_reg_mr(s_ctx-pd, conn-recv_region, BUFFER_SIZE, IBV_ACCESS_LOCAL_WRITE));
}int on_addr_resolved(struct rdma_cm_id *id)
{struct ibv_qp_init_attr qp_attr;struct connection *conn;printf(address resolved.\n);build_context(id-verbs);build_qp_attr(qp_attr);TEST_NZ(rdma_create_qp(id, s_ctx-pd, qp_attr));id-context conn (struct connection *)malloc(sizeof(struct connection));conn-id id;conn-qp id-qp;conn-num_completions 0;register_memory(conn);post_receives(conn);TEST_NZ(rdma_resolve_route(id, TIMEOUT_IN_MS));return 0;
}void on_completion(struct ibv_wc *wc)
{struct connection *conn (struct connection *)(uintptr_t)wc-wr_id;if (wc-status ! IBV_WC_SUCCESS)die(on_completion: status is not IBV_WC_SUCCESS.);if (wc-opcode IBV_WC_RECV)printf(received message: %s\n, conn-recv_region);else if (wc-opcode IBV_WC_SEND)printf(send completed successfully.\n);elsedie(on_completion: completion isnt a send or a receive.);if (conn-num_completions 2)rdma_disconnect(conn-id);
}int on_connection(void *context)
{struct connection *conn (struct connection *)context;struct ibv_send_wr wr, *bad_wr NULL;struct ibv_sge sge;snprintf(conn-send_region, BUFFER_SIZE, message from active/client side with pid %d, getpid());printf(connected. posting send...\n);memset(wr, 0, sizeof(wr));wr.wr_id (uintptr_t)conn;wr.opcode IBV_WR_SEND;wr.sg_list sge;wr.num_sge 1;wr.send_flags IBV_SEND_SIGNALED;sge.addr (uintptr_t)conn-send_region;sge.length BUFFER_SIZE;sge.lkey conn-send_mr-lkey;TEST_NZ(ibv_post_send(conn-qp, wr, bad_wr));return 0;
}int on_disconnect(struct rdma_cm_id *id)
{struct connection *conn (struct connection *)id-context;printf(disconnected.\n);rdma_destroy_qp(id);ibv_dereg_mr(conn-send_mr);ibv_dereg_mr(conn-recv_mr);free(conn-send_region);free(conn-recv_region);free(conn);rdma_destroy_id(id);return 1; /* exit event loop */
}int on_event(struct rdma_cm_event *event)
{int r 0;if (event-event RDMA_CM_EVENT_ADDR_RESOLVED)r on_addr_resolved(event-id);else if (event-event RDMA_CM_EVENT_ROUTE_RESOLVED)r on_route_resolved(event-id);else if (event-event RDMA_CM_EVENT_ESTABLISHED)r on_connection(event-id-context);else if (event-event RDMA_CM_EVENT_DISCONNECTED)r on_disconnect(event-id);elsedie(on_event: unknown event.);return r;
}int on_route_resolved(struct rdma_cm_id *id)
{struct rdma_conn_param cm_params;printf(route resolved.\n);memset(cm_params, 0, sizeof(cm_params));TEST_NZ(rdma_connect(id, cm_params));return 0;
} 更多讲解教程 InfiniBand, Verbs, RDMA | https://thegeekinthecorner.wordpress.com/category/infiniband-verbs-rdma/
RDMA read and write with IB verbs | https://thegeekinthecorner.wordpress.com/2010/09/28/rdma-read-and-write-with-ib-verbs/
http://www.hpcadvisorycouncil.com/pdf/building-an-rdma-capable-application-with-ib-verbs.pdf