diff --git a/tools/infer_tool/README.md b/tools/infer_tool/README.md new file mode 100644 index 0000000000000000000000000000000000000000..3d336c5d3190cbf62a3159eaf391d0439a051cc6 --- /dev/null +++ b/tools/infer_tool/README.md @@ -0,0 +1,514 @@ + + +# ais_bench推理工具使用指南 + +## 简介 +本文介绍ais_bench推理工具,用来针对指定的推理模型运行推理程序,并能够测试推理模型的性能(包括吞吐率、时延)。 + +## 工具安装 + +### 环境和依赖 + +- 目前ais_bench推理工具支持trtexec和aclruntime推理后端,使用本工具时确保安装这两个后端,且这两个后端可以正常运行。 +- 安装Python3、Python包模块numpy、tqdm、wheel。 + +### 工具安装方式 + +ais_bench推理工具的安装方式包括:一键式编译安装和源代码编译安装。 + +**说明**: + +- 安装过程中会自动检查和安装python包依赖,确保安装环境要求网络畅通。 +- centos平台默认为gcc 4.8编译器,可能无法安装本工具,建议更新gcc编译器后再安装。 + +#### 一键式编译安装 + 在安装环境执行如下命令安装ais_bench推理程序包: + + ```bash + pip3 install -v 'git+https://gitee.com/aisbench/inference.git#egg=ais_bench&subdirectory=tools/infer_tool/' + ``` + + 说明:若为覆盖安装,请增加“--force-reinstall”参数强制安装,例如: + + ```bash + pip3 install -v --force-reinstall 'git+https://gitee.com/aisbench/inference.git#egg=ais_bench&subdirectory=tools/infer_tool/' + ``` + + 提示如下示例信息则表示安装成功: + + ```bash + Successfully installed ais_bench-{version} + ``` + + + +#### 源代码编译安装 +1. 从代码开源仓[Gitee](git+https://gitee.com/aisbench/inference.git#egg=ais_bench&subdirectory=tools/infer_tool/)克隆/下载工具压缩包“inference-master.zip”。 + +2. 将工具压缩包上传并解压至安装环境。 + +3. 从工具解压目录下进入tools/infer_tool/目录下,执行如下命令进行编译: + + ```bash + # 进入工具解压目录 + cd ${HOME}/tools/infer_tool/ + # 构建ais_bench推理程序包 + pip3 wheel ./ -v + ``` + + 其中,${HOME}为ais_bench推理工具包所在目录。 + + 分别提示如下信息则表示编译成功: + + ```bash + # 成功编译ais_bench推理程序包 + Successfully built ais-bench + ``` + +4. 执行如下命令,进行安装。 + + ```bash + # 安装ais_bench推理程序 + pip3 install ./ais_bench-{version}-py3-none-any.whl + ``` + + {version}表示软件版本号,{python_version}表示Python版本号,{arch}表示CPU架构。 + + 说明:若为覆盖安装,请增加“--force-reinstall”参数强制安装,例如: + + ```bash + pip3 install ./ais_bench-{version}-py3-none-any.whl --force-reinstall + ``` + + 分别提示如下信息则表示安装成功: + + ```bash + # 成功安装ais_bench推理程序 + Successfully installed ais_bench-{version} + ``` + +## 使用方法(以接入aclruntime后端为例) + +### 工具介绍 +ais_bench推理工具的使用方法主要通过命令行使用。 +#### 使用入口 + +ais_bench推理工具可以通过ais_bench可执行文件方式启动模型测试。启动方式如下: + +```bash +python3 -m ais_bench --model +``` + +#### 参数说明 + +ais_bench推理工具可以通过配置不同的参数,来应对各种测试场景以及实现其他辅助功能。 + +参数按照功能类别分为**基础功能参数**和**高级功能参数**: + +- **基础功能参数**:主要包括输入输入文件及格式、debug、推理次数、预热次数、指定运行设备以及帮助信息等。 +- **高级功能参数**:主要包括动态分档场景和动态Shape场景的ais_bench推理测试参数以及profiler或dump数据获取等。 + +**说明**:以下参数中,参数和取值之间可以用“ ”空格分隔也可以用“=”等号分隔。例如:--debug 1或--debug=0。 + +##### 基础功能参数 + +| 参数名 | 说明 | 是否必选 | +| --------------------- | ------------------------------------------------------------ | -------- | +| --model | 需要进行推理的离线模型文件。 | 是 | +| --input | 模型需要的输入。可指定输入文件所在目录或直接指定输入文件。支持输入文件格式为“NPY”、“BIN”。可输入多个文件或目录,文件或目录之间用“,”隔开。具体输入文件请根据模型要求准备。 若不配置该参数,会自动构造输入数据,输入数据类型由--pure_data_type参数决定。 | 否 | +| --pure_data_type | 纯推理数据类型。取值为:“zero”、“random”,默认值为"zero"。 未配置模型输入文件时,工具自动构造输入数据。设置为zero时,构造全为0的纯推理数据;设置为random时,为每一个输入生成一组随机数据。 | 否 | +| --output | 推理结果保存目录。配置后会创建“日期+时间”的子目录,保存输出结果。如果指定output_dirname参数,输出结果将保存到子目录output_dirname下。不配置输出目录时,仅打印输出结果,不保存输出结果。 | 否 | +| --output_dirname | 推理结果保存子目录。设置该值时输出结果将保存到*output/output_dirname*目录下。 配合output参数使用,单独使用无效。 例如:--output */output* --output_dirname *output_dirname* | 否 | +| --outfmt | 输出数据的格式。取值为:“NPY”、“BIN”、“TXT”,默认为”BIN“。 配合output参数使用,单独使用无效。 例如:--output */output* --outfmt NPY。 | 否 | +| --debug | 调试开关。可打印model的desc信息和其他详细执行信息。1或true(开启)、0或false(关闭),默认关闭。 | 否 | +| --run_mode | 推理执行前的数据加载方式:可取值:array(将数据转换成host侧的ndarray,再调用推理接口推理),files(将文件直接加载进device内,再调用推理接口推理),tensor(将数据加载进device内,再调用推理接口推理),full(将数据转换成host侧的ndarray,再将ndarray格式数据加载进device内,再调用推理接口推理),默认为array。 | 否 | +| --display_all_summary | 是否显示所有的汇总信息,包含h2d和d2h信息。1或true(开启)、0或false(关闭),默认关闭。 | 否 | +| --loop | 推理次数。默认值为1,取值范围为大于0的正整数。 profiler参数配置为true时,推荐配置为1。 | 否 | +| --warmup_count | 推理预热次数。默认值为1,取值范围为大于等于0的整数。配置为0则表示不预热。 | 否 | +| --device | 指定运行设备。根据设备实际的Device ID指定,默认值为0。多Device场景下,可以同时指定多个Device进行推理测试,例如:--device 0,1,2,3。 | 否 | +| --divide_input | 输入数据集切分开关,1或true(开启)、0或false(关闭),默认关闭。多Device场景下,打开时,工具会将数据集平分给这些Device进行推理。| 否 | +| --help | 工具使用帮助信息。 | 否 | + +##### 高级功能参数 + +| 参数名 | 说明 | 是否必选 | +| ------------------------ | ------------------------------------------------------------ | -------- | +| --dymBatch | 动态Batch参数,指定模型输入的实际Batch。
如模型转换时,设置--input_shape="data:-1,600,600,3;img_info:-1,3" --dynamic_batch_size="1,2,4,8",dymBatch参数可设置为:--dymBatch 2。 | 否 | +| --dymHW | 动态分辨率参数,指定模型输入的实际H、W。
如模型转换时,设置--input_shape="data:8,3,-1,-1;img_info:8,4,-1,-1" --dynamic_image_size="300,500;600,800",dymHW参数可设置为:--dymHW 300,500。 | 否 | +| --dymDims | 动态维度参数,指定模型输入的实际Shape。
如模型转换时,设置 --input_shape="data:1,-1;img_info:1,-1" --dynamic_dims="224,224;600,600",dymDims参数可设置为:--dymDims "data:1,600;img_info:1,600"。 | 否 | +| --dymShape | 动态Shape参数,指定模型输入的实际Shape。
如ATC模型转换时,设置--input_shape_range="input1:\[8\~20,3,5,-1\];input2:\[5,3\~9,10,-1\]",dymShape参数可设置为:--dymShape "input1:8,3,5,10;input2:5,3,10,10"。
动态Shape场景下,获取模型的输出size通常为0(即输出数据占内存大小未知),建议设置--outputSize参数。
例如:--dymShape "input1:8,3,5,10;input2:5,3,10,10" --outputSize "10000,10000" | 否 | +| --dymShape_range | 动态Shape的阈值范围。如果设置该参数,那么将根据参数中所有的Shape列表进行依次推理,得到汇总推理信息。
配置格式为:name1:1,3,200\~224,224-230;name2:1,300。其中,name为模型输入名,“\~”表示范围,“-”表示某一位的取值。
也可以指定动态Shape的阈值范围配置文件*.info,该文件中记录动态Shape的阈值范围。 | 否 | +| --outputSize | 指定模型的输出数据所占内存大小,多个输出时,需要为每个输出设置一个值,多个值之间用“,”隔开。
动态Shape场景下,获取模型的输出size通常为0(即输出数据占内存大小未知),需要根据输入的Shape,预估一个较合适的大小,配置输出数据占内存大小。
例如:--dymShape "input1:8,3,5,10;input2:5,3,10,10" --outputSize "10000,10000" | 否 | +| --auto_set_dymdims_mode | 自动设置动态Dims模式。1或true(开启)、0或false(关闭),默认关闭。
针对动态档位Dims模型,根据输入的文件的信息,自动设置Shape参数,注意输入数据只能为npy文件,因为bin文件不能读取Shape信息。
配合input参数使用,单独使用无效。
例如:--input 1.npy --auto_set_dymdims_mode 1 | 否 | +| --auto_set_dymshape_mode | 自动设置动态Shape模式。取值为:1或true(开启)、0或false(关闭),默认关闭。
针对动态Shape模型,根据输入的文件的信息,自动设置Shape参数,注意输入数据只能为npy文件,因为bin文件不能读取Shape信息。
配合input参数使用,单独使用无效。
例如:--input 1.npy --auto_set_dymshape_mode 1 | 否 | +| --batchsize | 模型batchsize。不输入该值将自动推导。当前推理模块根据模型输入和文件输出自动进行组Batch。参数传递的batchszie有且只用于结果吞吐率计算。自动推导逻辑为尝试获取模型的batchsize时,首先获取第一个参数的最高维作为batchsize; 如果是动态Batch的话,更新为动态Batch的值;如果是动态dims和动态Shape更新为设置的第一个参数的最高维。如果自动推导逻辑不满足要求,请务必传入准确的batchsize值,以计算出正确的吞吐率。 | 否 | +| --output_batchsize_axis | 输出tensor的batchsize轴,默认值为0。输出结果保存文件时,根据哪个轴进行切割推理结果,比如batchsize为2,表示2个输入文件组batch进行推理,那输出结果的batch维度是在哪个轴。默认为0轴,按照0轴进行切割为2份,但是部分模型的输出batch为1轴,所以要设置该值为1。 | 否 | +| --backend|指定trtexec开关。需要指定为trtexec。配合--perf参数使用,单独使用无效。|否| +| --perf|调用trtexec开关。1或true(开启)、0或false(关闭),默认关闭。配合--backend参数使用,单独使用无效。|否| +| --pipeline |指定pipeline开关,用于开启多线程推理功能。1或true(开启)、0或false(关闭),默认关闭。|否| +| --threads |指定threads开关,用于设置多计算线程推理时计算线程的数量。默认值为1,取值范围为大于0的正整数。需要配合--pipeline 1参数使用,单独使用无效。|否| + +### 使用场景 + + #### 纯推理场景 + +默认情况下,构造全为0的数据送入模型推理。 + +示例命令如下: + +```bash +python3 -m ais_bench --model --output ./ --outfmt BIN --loop 5 +``` + +#### 调试模式 +开启debug调试模式。 + +示例命令如下: + +```bash +python3 -m ais_bench --model --output ./ --debug 1 +``` + +调试模式开启后会增加更多的打印信息,包括: +- 模型的输入输出参数信息 + + ```bash + input: + #0 input_ids (1, 384) int32 1536 1536 + #1 input_mask (1, 384) int32 1536 1536 + #2 segment_ids (1, 384) int32 1536 1536 + output: + #0 logits:0 (1, 384, 2) float32 3072 3072 + ``` + +- 详细的推理耗时信息 + + ```bash + [DEBUG] model exec cost : 2.336000 + ``` +- 模型输入输出等具体操作信息 + + #### 文件输入场景 + +使用--input参数指定模型输入文件,多个文件之间通过“,”进行分隔。 + +本场景会根据文件输入size和模型实际输入size进行对比,若缺少数据则会自动构造数据补全,称为组Batch。 + +示例命令如下: + +```bash +python3 -m ais_bench --model --input "./1.bin,./2.bin,./3.bin,./4.bin,./5.bin" +``` + + #### 文件夹输入场景 + +使用input参数指定模型输入文件所在目录,多个目录之间通过“,”进行分隔。 + +本场景会根据文件输入size和模型实际输入size进行组Batch。 + +```bash +python3 -m ais_bench --model --input "./" +``` + +模型输入需要与传入文件夹的个数一致。 + +例如,bert模型有三个输入,则必须传入3个文件夹,且三个文件夹分别对应模型的三个输入,顺序要对应。 +模型输入参数的信息可以通过开启调试模式查看,bert模型的三个输入依次为input_ids、 input_mask、 segment_ids,所以依次传入三个文件夹: + +- 第一个文件夹“./data/SQuAD1.1/input_ids",对应模型第一个参数"input_ids"的输入 +- 第二个文件夹"./data/SQuAD1.1/input_mask",对应第二个输入"input_mask"的输入 +- 第三个文件夹"./data/SQuAD1.1/segment_ids",对应第三个输入"segment_ids"的输入 + +```bash +python3 -m ais_bench --model --input ./data/SQuAD1.1/input_ids,./data/SQuAD1.1/input_mask,./data/SQuAD1.1/segment_ids +``` + + + +#### 多Device场景 + +多Device场景下,可以同时指定多个Device进行推理测试。 + +示例命令如下: + +```bash +python3 -m ais_bench --model --input ./data/ --device 1,2 +``` + +输出结果依次展示每个Device的推理测试结果,示例如下: + +```bash +[INFO] -----------------Performance Summary------------------ +[INFO] NPU_compute_time (ms): min = 2.4769999980926514, max = 3.937000036239624, mean = 3.5538000106811523, median = 3.7230000495910645, percentile(99%) = 3.936680030822754 +[INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(3.5538000106811523): 281.38893494131406 +[INFO] ------------------------------------------------------ +[INFO] -----------------Performance Summary------------------ +[INFO] NPU_compute_time (ms): min = 3.3889999389648438, max = 3.9230000972747803, mean = 3.616000032424927, median = 3.555000066757202, percentile(99%) = 3.9134000968933105 +[INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(3.616000032424927): 276.54867008654026 +[INFO] ------------------------------------------------------ +[INFO] multidevice run end qsize:4 result:1 +i:0 device_1 throughput:281.38893494131406 start_time:1676875630.804429 end_time:1676875630.8303885 +i:1 device_2 throughput:276.54867008654026 start_time:1676875630.8043878 end_time:1676875630.8326817 +[INFO] summary throughput:557.9376050278543 +``` + +其中结果最后展示每个Device推理测试的throughput(吞吐率)、start_time(测试启动时间)、end_time(测试结束时间)以及summary throughput(吞吐率汇总)。其他详细字段解释请参见本手册的“输出结果”章节。 + + #### 动态分档场景 + +主要包含动态Batch、动态HW(宽高)、动态Dims三种场景,需要分别传入dymBatch、dymHW、dymDims指定实际档位信息。 + +##### 动态Batch + +以档位1 2 4 8档为例,设置档位为2,本程序将获取实际模型输入组Batch,每2个输入为一组,进行组Batch。 + +```bash +python3 -m ais_bench --model --input=./data/ --dymBatch 2 +``` + +##### 动态HW宽高 + +以档位224,224;448,448档为例,设置档位为224,224,本程序将获取实际模型输入组Batch。 + +```bash +python3 -m ais_bench --model --input=./data/ --dymHW 224,224 +``` + +##### 动态Dims + +以设置档位1,3,224,224为例,本程序将获取实际模型输入组Batch。 + +```bash +python3 -m ais_bench --model --input=./data/ --dymDims actual_input_1:1,3,224,224 +``` + +##### 自动设置Dims模式(动态Dims模型) + +动态Dims模型输入数据的Shape可能是不固定的,比如一个输入文件Shape为1,3,224,224,另一个输入文件Shape为 1,3,300,300。若两个文件同时推理,则需要设置两次动态Shape参数,当前不支持该操作。针对该场景,增加auto_set_dymdims_mode模式,可以根据输入文件的Shape信息,自动设置模型的Shape参数。 + +```bash +python3 -m ais_bench --model --input=./data/ --auto_set_dymdims_mode 1 +``` + + +#### 动态Shape场景 + +##### 动态Shape + +以ATC设置[1\~8,3,200\~300,200\~300],设置档位1,3,224,224为例,本程序将获取实际模型输入组Batch。 + +动态Shape的输出大小通常为0,建议通过outputSize参数设置对应输出的内存大小。 + +```bash +python3 -m ais_bench --model --dymShape actual_input_1:1,3,224,224 --outputSize 10000 +``` + +##### 自动设置Shape模式(动态Shape模型) + +动态Shape模型输入数据的Shape可能是不固定的,比如一个输入文件Shape为1,3,224,224 另一个输入文件Shape为 1,3,300,300。若两个文件同时推理,则需要设置两次动态Shape参数,当前不支持该操作。针对该场景,增加auto_set_dymshape_mode模式,可以根据输入文件的Shape信息,自动设置模型的Shape参数。 + +```bash +python3 -m ais_bench --model --outputSize 100000 --auto_set_dymshape_mode 1 --input ./dymdata +``` + +**注意该场景下的输入文件必须为npy格式,如果是bin文件将获取不到真实的Shape信息。** + +##### 动态Shape模型range测试模式 + +输入动态Shape的range范围。对于该范围内的Shape分别进行推理,得出各自的性能指标。 + +以对1,3,224,224 1,3,224,225 1,3,224,226进行分别推理为例,命令如下: + +```bash +python3 -m ais_bench --model --outputSize 100000 --dymShape_range actual_input_1:1,3,224,224~226 +``` + + +#### trtexec场景 + +ais_bench支持ONNX模型推理(集成trtexec),trtexec为NVIDIA TensorRT自带工具,作为推理后端。用户使用ais_bench拉起trtexec工具进行推理性能测试,测试过程中实时输出trtexec日志,打印在控制台,推理性能测试完成后,将性能数据输出在控制台。 +##### 前置条件 +推理性能测试环境需要配置有GPU,安装 CUDA及TensorRT,并且trtexec可以通过命令行调用到,安装方式可参考[TensorRT](https://github.com/NVIDIA/TensorRT)。 + +示例命令如下: + +```bash +python3 -m ais_bench --model --backend trtexec --perf 1 +``` + +输出结果推理测试结果,示例如下: + +```bash +[INFO] [05/27/2023-12:05:31] [I] === Performance summary === +[INFO] [05/27/2023-12:05:31] [I] Throughput: 120.699 qps +[INFO] [05/27/2023-12:05:31] [I] Latency: min = 9.11414 ms, max = 11.7442 ms, mean = 9.81005 ms, median = 9.76404 ms, percentile(90%) = 10.1075 ms, percentile(95%) = 10.1624 ms, percentile(99%) = 11.4742 ms +[INFO] [05/27/2023-12:05:31] [I] Enqueue Time: min = 0.516296 ms, max = 0.598633 ms, mean = 0.531443 ms, median = 0.5271 ms, percentile(90%) = 0.546875 ms, percentile(95%) = 0.564575 ms, percentile(99%) = 0.580566 ms +[INFO] [05/27/2023-12:05:31] [I] H2D Latency: min = 1.55066 ms, max = 1.57336 ms, mean = 1.55492 ms, median = 1.55444 ms, percentile(90%) = 1.55664 ms, percentile(95%) = 1.55835 ms, percentile(99%) = 1.56458 ms +[INFO] [05/27/2023-12:05:31] [I] GPU Compute Time: min = 7.54407 ms, max = 10.1723 ms, mean = 8.23978 ms, median = 8.19409 ms, percentile(90%) = 8.5354 ms, percentile(95%) = 8.59131 ms, percentile(99%) = 9.90002 ms +[INFO] [05/27/2023-12:05:31] [I] D2H Latency: min = 0.0130615 ms, max = 0.0170898 ms, mean = 0.015342 ms, median = 0.0153809 ms, percentile(90%) = 0.0162354 ms, percentile(95%) = 0.0163574 ms, percentile(99%) = 0.0168457 ms +[INFO] [05/27/2023-12:05:31] [I] Total Host Walltime: 3.02405 s +[INFO] [05/27/2023-12:05:31] [I] Total GPU Compute Time: 3.00752 s +``` + +**字段说明** + +| 字段 | 说明 | +| --------------------- | ------------------------------------------------------------ | +| Throughput | 吞吐率。 | +| Latency | H2D延迟、GPU计算时间和D2H延迟的总和。这是推断单个执行的延迟。 | +| min | 推理执行时间最小值。 | +| max | 推理执行时间最大值。 | +| mean | 推理执行时间平均值。 | +| median | 推理执行时间取中位数。 | +| percentile(99%) | 推理执行时间中的百分位数。 | +| H2D Latency | 单个执行的输入张量的主机到设备数据传输的延迟。 | +| GPU Compute Time | 为执行CUDA内核的GPU延迟。 | +| D2H Latency | 单个执行的输出张量的设备到主机数据传输的延迟。 | +| Total Host Walltime | 从第一个执行(预热后)入队到最后一个执行完成的主机时间。 | +| Total GPU Compute Time| 所有执行的GPU计算时间的总和。 | + + #### 输出结果文件保存场景 + +默认情况下,ais_bench推理工具执行后不保存输出结果数据文件,配置相关参数后,可生成的结果数据如下: + +| 文件/目录 | 说明 | +| ---------------------------------------- | ------------------------------------------------------------ | +| {文件名}.bin、{文件名}.npy或{文件名}.txt | 模型推理输出结果文件。
文件命名格式:名称_输出序号.后缀。不指定input时(纯推理),名称固定为“pure_infer_data”;指定input时,名称以第一个输入的第一个名称命名;输出的序号从0开始按输出先后顺序排列;文件名后缀由--outfmt参数控制。
默认情况下,会在--output参数指定的目录下创建“日期+时间”的目录,并将结果文件保存在该目录下;当指定了--output_dirname时,结果文件将直接保存在--output_dirname参数指定的目录下。
指定--output_dirname参数时,多次执行工具推理会导致结果文件因同名而覆盖。 | +| xx_summary.json | 工具输出模型性能结果数据。默认情况下,“xx”以“日期+时间”命名;当指定了--output_dirname时,“xx”以--output_dirname指定的目录名称命名。
指定--output_dirname参数时,多次执行工具推理会导致结果文件因同名而覆盖。 | +| dump | dump数据文件目录。使用--dump开启dump时,在--output参数指定的目录下创建dump目录,保存dump数据文件。 | +| profiler | Profiler采集性能数据文件目录。使用--profiler开启性能数据采集时,在--output参数指定的目录下创建profiler目录,保存性能数据文件。 | + +- 仅设置--output参数。示例命令及结果如下: + + ```bash + python3 -m ais_bench --model ./pth_resnet50_bs1.om --output ./result + ``` + + ```bash + result + |-- 2022_12_17-07_37_18 + │   `-- pure_infer_data_0.bin + `-- 2022_12_17-07_37_18_summary.json + ``` + +- 设置--input和--output参数。示例命令及结果如下: + + ```bash + # 输入的input文件夹内容如下 + ls ./data + 196608-0.bin 196608-1.bin 196608-2.bin 196608-3.bin 196608-4.bin 196608-5.bin 196608-6.bin 196608-7.bin 196608-8.bin 196608-9.bin + ``` + + ```bash + python3 -m ais_bench --model ./pth_resnet50_bs1.om --input ./data --output ./result + ``` + + ```bash + result/ + |-- 2023_01_03-06_35_53 + | |-- 196608-0_0.bin + | |-- 196608-1_0.bin + | |-- 196608-2_0.bin + | |-- 196608-3_0.bin + | |-- 196608-4_0.bin + | |-- 196608-5_0.bin + | |-- 196608-6_0.bin + | |-- 196608-7_0.bin + | |-- 196608-8_0.bin + | `-- 196608-9_0.bin + `-- 2023_01_03-06_35_53_summary.json + ``` + +- 设置--output_dirname参数。示例命令及结果如下: + + ```bash + python3 -m ais_bench --model --output ./result --output_dirname subdir + ``` + + ```bash + result + |-- subdir + │   `-- pure_infer_data_0.bin + `-- subdir_summary.json + ``` + +- 设置--dump参数。示例命令及结果如下: + + ```bash + python3 -m ais_bench --model --output ./result --dump 1 + ``` + + ```bash + result + |-- 2022_12_17-07_37_18 + │   `-- pure_infer_data_0.bin + |-- dump + `-- 2022_12_17-07_37_18_summary.json + ``` + +- 设置--profiler参数。示例命令及结果如下: + + ```bash + python3 -m ais_bench --model --output ./result --profiler 1 + ``` + + ```bash + result + |-- 2022_12_17-07_56_10 + │   `-- pure_infer_data_0.bin + |-- profiler + │   `-- PROF_000001_20221217075609326_GLKQJOGROQGOLIIB + `-- 2022_12_17-07_56_10_summary.json + ``` + +#### 多线程推理场景 + + ```bash + python3 -m ais_bench --model --pipeline 1 + ``` + 在单线程推理的命令行基础上加上--pipeline 1即可开启多线程推理模式,实现计算-搬运的并行,加快端到端推理速度。 + + ```bash + python3 -m ais_bench --model --pipeline 1 --threads 2 + ``` + 在多线程推理的命令行基础上加上--threads {$number of threads},即可开启多计算线程推理模式,实现计算-计算的并行,提高推理吞吐量。 + +### 输出结果 + +ais_bench推理工具执行后,打屏输出结果示例如下: + +- display_all_summary=False时,打印如下: + + ```bash + [INFO] -----------------Performance Summary------------------ + [INFO] NPU_compute_time (ms): min = 0.6610000133514404, max = 0.6610000133514404, mean = 0.6610000133514404, median = 0.6610000133514404, percentile(99%) = 0.6610000133514404 + [INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(0.6610000133514404): 1512.8592735267011 + [INFO] ------------------------------------------------------ + ``` + +- display_all_summary=True时,打印如下: + + ```bash + [INFO] -----------------Performance Summary------------------ + [INFO] H2D_latency (ms): min = 0.05700000002980232, max = 0.05700000002980232, mean = 0.05700000002980232, median = 0.05700000002980232, percentile(99%) = 0.05700000002980232 + [INFO] NPU_compute_time (ms): min = 0.6650000214576721, max = 0.6650000214576721, mean = 0.6650000214576721, median = 0.6650000214576721, percentile(99%) = 0.6650000214576721 + [INFO] D2H_latency (ms): min = 0.014999999664723873, max = 0.014999999664723873, mean = 0.014999999664723873, median = 0.014999999664723873, percentile(99%) = 0.014999999664723873 + [INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(0.6650000214576721): 1503.759349974173 + ``` + +通过输出结果可以查看模型执行耗时、吞吐率。耗时越小、吞吐率越高,则表示该模型性能越高。 + +**字段说明** + +| 字段 | 说明 | +| --------------------- | ------------------------------------------------------------ | +| H2D_latency (ms) | Host to Device的内存拷贝耗时。单位为ms。 | +| min | 推理执行时间最小值。 | +| max | 推理执行时间最大值。 | +| mean | 推理执行时间平均值。 | +| median | 推理执行时间取中位数。 | +| percentile(99%) | 推理执行时间中的百分位数。 | +| NPU_compute_time (ms) | NPU推理计算的时间。单位为ms。 | +| D2H_latency (ms) | Device to Host的内存拷贝耗时。单位为ms。 | +| throughput | 吞吐率。吞吐率计算公式:1000 *batchsize/npu_compute_time.mean | +| batchsize | 批大小。本工具不一定能准确识别当前样本的batchsize,建议通过--batchsize参数进行设置。 | diff --git a/tools/infer_tool/__init__.py b/tools/infer_tool/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..86c34465080b1a393e796e85b5bc1d0e49f3ffea --- /dev/null +++ b/tools/infer_tool/__init__.py @@ -0,0 +1,16 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from components.utils.parser import load_command_instance + +benchmark_cmd = load_command_instance('benchmark_sub_task') \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/__init__.py b/tools/infer_tool/ais_bench/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/tools/infer_tool/ais_bench/__main__.py b/tools/infer_tool/ais_bench/__main__.py new file mode 100644 index 0000000000000000000000000000000000000000..123cffcf5ffa21ddc647eed2e831d3d2d6b267d8 --- /dev/null +++ b/tools/infer_tool/ais_bench/__main__.py @@ -0,0 +1,18 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import os +cur_path = os.path.dirname(os.path.realpath(__file__)) +exec(open(os.path.join(cur_path, "infer/__main__.py")).read()) \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/__init__.py b/tools/infer_tool/ais_bench/infer/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/tools/infer_tool/ais_bench/infer/__main__.py b/tools/infer_tool/ais_bench/infer/__main__.py new file mode 100644 index 0000000000000000000000000000000000000000..2359a582477e9ec52b8756baab5cad044d90f184 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/__main__.py @@ -0,0 +1,281 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os +import re +from ais_bench.infer.infer_process import infer_process +from ais_bench.infer.args_adapter import AISBenchInferArgsAdapter +from ais_bench.infer.args_check import ( + check_dym_string, check_dym_range_string, check_number_list, str2bool, check_positive_integer, + check_batchsize_valid, check_nonnegative_integer, check_device_range_valid, check_om_path_legality, + check_input_path_legality, check_output_path_legality, check_acl_json_path_legality, + check_aipp_config_path_legality +) + + +def get_args(): + parser = argparse.ArgumentParser() + parser.add_argument( + "--model", + "-m", + type=check_om_path_legality, + required=True, + help="The path of the om model" + ) + parser.add_argument( + "--input", + "-i", + type=check_input_path_legality, + default=None, + help="Input file or dir" + ) + parser.add_argument( + "--output", + "-o", + type=check_output_path_legality, + default=None, + help="Inference data output path. The inference results are output to \ + the subdirectory named current date under given output path" + ) + parser.add_argument( + "--output_dirname", + type=check_output_path_legality, + default=None, + help="Actual output directory name. \ + Used with parameter output, cannot be used alone. \ + The inference result is output to subdirectory named by output_dirname \ + under output path. such as --output_dirname 'tmp', \ + the final inference results are output to the folder of {$output}/tmp" + ) + parser.add_argument( + "--outfmt", + default="BIN", + choices=["NPY", "BIN", "TXT"], + help="Output file format (NPY or BIN or TXT)" + ) + parser.add_argument( + "--loop", + "-l", + type=check_positive_integer, + default=1, + help="The round of the PureInfer." + ) + parser.add_argument( + "--debug", + type=str2bool, + default=False, + help="Debug switch,print model information" + ) + parser.add_argument( + "--device", + "-d", + type=check_device_range_valid, + default=0, + help="The NPU device ID to use.valid value range is [0, 255]" + ) + parser.add_argument( + "--dymBatch", + dest="dym_batch", + type=check_positive_integer, + default=0, + help="Dynamic batch size param,such as --dymBatch 2" + ) + parser.add_argument( + "--dymHW", + dest="dym_hw", + type=check_dym_string, + default=None, + help="Dynamic image size param, such as --dymHW \"300,500\"" + ) + parser.add_argument( + "--dymDims", + dest="dym_dims", + type=check_dym_string, + default=None, + help="Dynamic dims param, such as --dymDims \"data:1,600;img_info:1,600\"" + ) + parser.add_argument( + "--dymShape", + "--dym-shape", + dest="dym_shape", + type=check_dym_string, + default=None, + help="Dynamic shape param, such as --dymShape \"data:1,600;img_info:1,600\"" + ) + parser.add_argument( + "--outputSize", + dest="output_size", + type=check_number_list, + default=None, + help="Output size for dynamic shape mode" + ) + parser.add_argument( + "--auto_set_dymshape_mode", + type=str2bool, + default=False, + help="Auto_set_dymshape_mode" + ) + parser.add_argument( + "--auto_set_dymdims_mode", + type=str2bool, + default=False, + help="Auto_set_dymdims_mode" + ) + parser.add_argument( + "--batchsize", + type=check_batchsize_valid, + default=None, + help="Batch size of input tensor" + ) + parser.add_argument( + "--pure_data_type", + type=str, + default="zero", + choices=["zero", "random"], + help="Null data type for pure inference(zero or random)" + ) + parser.add_argument( + "--profiler", + type=str2bool, + default=False, + help="Profiler switch" + ) + parser.add_argument( + "--dump", + type=str2bool, + default=False, + help="Dump switch" + ) + parser.add_argument( + "--acl_json_path", + type=check_acl_json_path_legality, + default=None, + help="Acl json path for profiling or dump" + ) + parser.add_argument( + "--output_batchsize_axis", + type=check_nonnegative_integer, + default=0, + help="Splitting axis number when outputing tensor results, such as --output_batchsize_axis 1" + ) + parser.add_argument( + "--run_mode", + type=str, + default="array", + choices=["array", "files", "tensor", "full"], + help="Run mode" + ) + parser.add_argument( + "--display_all_summary", + type=str2bool, + default=False, + help="Display all summary include h2d d2h info" + ) + parser.add_argument( + "--warmup_count", + "--warmup-count", + type=check_nonnegative_integer, + default=1, + help="Warmup count before inference" + ) + parser.add_argument( + "--dymShape_range", + dest="dym_shape_range", + type=check_dym_range_string, + default=None, + help="Dynamic shape range, such as --dymShape_range \"data:1,600~700;img_info:1,600-700\"" + ) + parser.add_argument( + "--aipp_config", + type=check_aipp_config_path_legality, + default=None, + help="File type: .config, to set actual aipp params before infer" + ) + parser.add_argument( + "--energy_consumption", + type=str2bool, + default=False, + help="Obtain power consumption data for model inference" + ) + parser.add_argument( + "--npu_id", + type=check_nonnegative_integer, + default=0, + help="The NPU ID to use.valid value range is [0, 255]" + ) + parser.add_argument( + "--backend", + type=str, + default=None, + choices=["trtexec"], + help="Backend trtexec" + ) + parser.add_argument( + "--perf", + type=str2bool, + default=False, + help="Perf switch" + ) + parser.add_argument( + "--pipeline", + type=str2bool, + default=False, + help="Pipeline switch" + ) + parser.add_argument( + "--profiler_rename", + type=str2bool, + default=True, + help="Profiler rename switch" + ) + parser.add_argument( + "--dump_npy", + type=str2bool, + default=False, + help="dump data convert to npy" + ) + parser.add_argument( + "--divide_input", + type=str2bool, + default=False, + help="Input datas need to be divided to match multi devices or not, \ + --device should be list, default False" + ) + parser.add_argument( + '--threads', + dest='threads', + type=check_positive_integer, + default=1, + help="Number of threads for computing. \ + need to set --pipeline when setting threads number to be more than one." + ) + benchmark_args = parser.parse_args() + + return benchmark_args + + +if __name__ == "__main__": + args = get_args() + + args = AISBenchInferArgsAdapter(args.model, args.input, args.output, + args.output_dirname, args.outfmt, args.loop, args.debug, args.device, + args.dym_batch, args.dym_hw, args.dym_dims, args.dym_shape, args.output_size, + args.auto_set_dymshape_mode, args.auto_set_dymdims_mode, args.batchsize, args.pure_data_type, + args.profiler, args.dump, args.acl_json_path, args.output_batchsize_axis, args.run_mode, + args.display_all_summary, args.warmup_count, args.dym_shape_range, args.aipp_config, + args.energy_consumption, args.npu_id, args.backend, args.perf, args.pipeline, args.profiler_rename, + args.dump_npy, args.divide_input, args.threads) + ret = infer_process(args) + exit(ret) diff --git a/tools/infer_tool/ais_bench/infer/args_adapter.py b/tools/infer_tool/ais_bench/infer/args_adapter.py new file mode 100644 index 0000000000000000000000000000000000000000..a7c24d9412882103765e141d7a3d93ce7e8d366d --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/args_adapter.py @@ -0,0 +1,96 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +class AISBenchInferArgsAdapter(): + def __init__(self, model, input_path, output, output_dirname, outfmt, loop, + debug, device, dym_batch, dym_hw, dym_dims, + dym_shape, output_size, auto_set_dymshape_mode, + auto_set_dymdims_mode, batchsize, pure_data_type, + profiler, dump, acl_json_path, output_batchsize_axis, + run_mode, display_all_summary, warmup_count, dym_shape_range, aipp_config, + energy_consumption, npu_id, backend, perf, pipeline, profiler_rename, + dump_npy, divide_input, threads): + self.model = model + self.input = input_path + self.output = output + self.output_dirname = output_dirname + self.outfmt = outfmt + self.loop = loop + self.debug = debug + self.device = device + self.dym_batch = dym_batch + self.dym_hw = dym_hw + self.dym_dims = dym_dims + self.dym_shape = dym_shape + self.output_size = output_size + self.auto_set_dymshape_mode = auto_set_dymshape_mode + self.auto_set_dymdims_mode = auto_set_dymdims_mode + self.batchsize = batchsize + self.pure_data_type = pure_data_type + self.profiler = profiler + self.dump = dump + self.acl_json_path = acl_json_path + self.output_batchsize_axis = output_batchsize_axis + self.run_mode = run_mode + self.display_all_summary = display_all_summary + self.warmup_count = warmup_count + self.dym_shape_range = dym_shape_range + self.aipp_config = aipp_config + self.energy_consumption = energy_consumption + self.npu_id = npu_id + self.backend = backend + self.perf = perf + self.pipeline = pipeline + self.profiler_rename = profiler_rename + self.dump_npy = dump_npy + self.divide_input = divide_input + self.threads = threads + + def get_all_args_dict(self): + args_dict = {} + args_dict.update({'--model':self.model}) + args_dict.update({'--input':self.input}) + args_dict.update({'--output':self.output}) + args_dict.update({'--output_dirname':self.output_dirname}) + args_dict.update({'--outfmt':self.outfmt}) + args_dict.update({'--loop':self.loop}) + args_dict.update({'--debug':self.debug}) + args_dict.update({'--device':self.device}) + args_dict.update({'--dymBatch':self.dym_batch}) + args_dict.update({'--dymHW':self.dym_hw}) + args_dict.update({'--dymDims':self.dym_dims}) + args_dict.update({'--dymShape':self.dym_shape}) + args_dict.update({'--outputSize':self.output_size}) + args_dict.update({'--auto_set_dymshape_mode':self.auto_set_dymshape_mode}) + args_dict.update({'--auto_set_dymdims_mode':self.auto_set_dymdims_mode}) + args_dict.update({'--batchsize':self.batchsize}) + args_dict.update({'--pure_data_type':self.pure_data_type}) + args_dict.update({'--profiler':self.profiler}) + args_dict.update({'--dump':self.dump}) + args_dict.update({'--acl_json_path':self.acl_json_path}) + args_dict.update({'--output_batchsize_axis':self.output_batchsize_axis}) + args_dict.update({'--run_mode':self.run_mode}) + args_dict.update({'--display_all_summary':self.display_all_summary}) + args_dict.update({'--warmup_count':self.warmup_count}) + args_dict.update({'--dymShape_range':self.dym_shape_range}) + args_dict.update({'--aipp_config':self.aipp_config}) + args_dict.update({'--energy_consumption':self.energy_consumption}) + args_dict.update({'--npu_id':self.npu_id}) + args_dict.update({'--perf':self.perf}) + args_dict.update({'--pipeline':self.pipeline}) + args_dict.update({'--profiler_rename':self.profiler_rename}) + args_dict.update({'--dump_npy':self.dump_npy}) + args_dict.update({'--divide_input':self.divide_input}) + args_dict.update({'--threads':self.threads}) + return args_dict \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/args_check.py b/tools/infer_tool/ais_bench/infer/args_check.py new file mode 100644 index 0000000000000000000000000000000000000000..1093fa422fa7eb361f6511c70216b35100396a90 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/args_check.py @@ -0,0 +1,194 @@ +import os +import re +import argparse +from ais_bench.infer.common.path_security_check import FileStat + +OM_MODEL_MAX_SIZE = 10 * 1024 * 1024 * 1024 # 10GB +ACL_JSON_MAX_SIZE = 8 * 1024 # 8KB +AIPP_CONFIG_MAX_SIZE = 12.5 * 1024 # 12.5KB + + +def check_dym_string(value): + if not value: + return value + dym_string = value + regex = re.compile(r"[^_A-Za-z0-9,;:/.-]") + if regex.search(dym_string): + raise argparse.ArgumentTypeError(f"dym string \"{dym_string}\" is not a legal string") + return dym_string + + +def check_dym_range_string(value): + if not value: + return value + dym_string = value + regex = re.compile(r"[^_A-Za-z0-9,;:/.\-~]") + if regex.search(dym_string): + raise argparse.ArgumentTypeError(f"dym range string \"{dym_string}\" is not a legal string") + return dym_string + + +def check_number_list(value): + if not value: + return value + number_list = value + regex = re.compile(r"[^0-9,;]") + if regex.search(number_list): + raise argparse.ArgumentTypeError(f"number_list \"{number_list}\" is not a legal list") + return number_list + + +def str2bool(v): + if isinstance(v, bool): + return v + if v.lower() in ('yes', 'true', 't', 'y', '1'): + return True + elif v.lower() in ('no', 'false', 'f', 'n', '0'): + return False + else: + raise argparse.ArgumentTypeError('Boolean value expected true, 1, false, 0 with case insensitive.') + + +def check_positive_integer(value): + ivalue = int(value) + if ivalue <= 0: + raise argparse.ArgumentTypeError("%s is an invalid positive int value" % value) + return ivalue + + +def check_batchsize_valid(value): + # default value is None + if value is None: + return value + # input value no None + else: + return check_positive_integer(value) + + +def check_nonnegative_integer(value): + ivalue = int(value) + if ivalue < 0: + raise argparse.ArgumentTypeError("%s is an invalid nonnegative int value" % value) + return ivalue + + +def check_npu_id_range_vaild(value): + # if contain , split to int list + min_value = 0 + max_value = 2048 + if ',' in value: + ilist = [int(v) for v in value.split(',')] + for ivalue in ilist: + if ivalue < min_value or ivalue > max_value: + raise argparse.ArgumentTypeError("{} of npu_id:{} is invalid. valid value range is [{}, {}]".format( + ivalue, value, min_value, max_value)) + return ilist + else: + # default as single int value + ivalue = int(value) + if ivalue < min_value or ivalue > max_value: + raise argparse.ArgumentTypeError("npu_id:{} is invalid. valid value range is [{}, {}]".format( + ivalue, min_value, max_value)) + return ivalue + + +def check_device_range_valid(value): + # if contain , split to int list + min_value = 0 + max_value = 255 + try: + # Check if the value contains a comma; if so, split into a list of integers + if ',' in value: + ilist = [int(v) for v in value.split(',')] + for ivalue in ilist: + if ivalue < min_value or ivalue > max_value: + raise argparse.ArgumentTypeError("{} of device:{} is invalid. valid value range is [{}, {}]".format( + ivalue, value, min_value, max_value)) + return ilist + else: + # default as single int value + ivalue = int(value) + if ivalue < min_value or ivalue > max_value: + raise argparse.ArgumentTypeError("device:{} is invalid. valid value range is [{}, {}]".format( + ivalue, min_value, max_value)) + return ivalue + except ValueError: + raise argparse.ArgumentTypeError("Argument npu-id invalid input value: {}. " + "Please provide a valid integer or a comma-separated list of integers.".format(value)) + + + +def check_om_path_legality(value): + path_value = value + try: + file_stat = FileStat(path_value) + except Exception as err: + raise argparse.ArgumentTypeError(f"om path:{path_value} is illegal. Please check.") from err + if not file_stat.is_basically_legal('read'): + raise argparse.ArgumentTypeError(f"om path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_type(["om"]): + raise argparse.ArgumentTypeError(f"om path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_size(OM_MODEL_MAX_SIZE): + raise argparse.ArgumentTypeError(f"om path:{path_value} is illegal. Please check.") + return path_value + + +def check_input_path_legality(value): + if not value: + return value + inputs_list = value.split(',') + for input_path in inputs_list: + try: + file_stat = FileStat(input_path) + except Exception as err: + raise argparse.ArgumentTypeError(f"input path:{input_path} is illegal. Please check.") from err + if not file_stat.is_basically_legal('read'): + raise argparse.ArgumentTypeError(f"input path:{input_path} is illegal. Please check.") + return value + + +def check_output_path_legality(value): + if not value: + return value + path_value = value + try: + file_stat = FileStat(path_value) + except Exception as err: + raise argparse.ArgumentTypeError(f"weight path:{path_value} is illegal. Please check.") from err + if not file_stat.is_basically_legal("write"): + raise argparse.ArgumentTypeError(f"output path:{path_value} is illegal. Please check.") + return path_value + + +def check_acl_json_path_legality(value): + if not value: + return value + path_value = value + try: + file_stat = FileStat(path_value) + except Exception as err: + raise argparse.ArgumentTypeError(f"acl json path:{path_value} is illegal. Please check.") from err + if not file_stat.is_basically_legal('read'): + raise argparse.ArgumentTypeError(f"acl json path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_type(["json"]): + raise argparse.ArgumentTypeError(f"acl json path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_size(ACL_JSON_MAX_SIZE): + raise argparse.ArgumentTypeError(f"acl json path:{path_value} is illegal. Please check.") + return path_value + + +def check_aipp_config_path_legality(value): + if not value: + return value + path_value = value + try: + file_stat = FileStat(path_value) + except Exception as err: + raise argparse.ArgumentTypeError(f"aipp config path:{path_value} is illegal. Please check.") from err + if not file_stat.is_basically_legal('read'): + raise argparse.ArgumentTypeError(f"aipp config path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_type(["config"]): + raise argparse.ArgumentTypeError(f"aipp config path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_size(AIPP_CONFIG_MAX_SIZE): + raise argparse.ArgumentTypeError(f"aipp config path:{path_value} is illegal. Please check.") + return path_value \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/backends/__init__.py b/tools/infer_tool/ais_bench/infer/backends/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e78cd1be1f30b4826f6e2da993ddaee1372f06cb --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/backends/__init__.py @@ -0,0 +1,30 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import os + +from ais_bench.infer import registry + +BACKEND_REGISTRY = registry.Registry("BACKEND_REGISTRY") + +registry.import_all_modules_for_register( + os.path.dirname(os.path.abspath(__file__)), "ais_bench.infer.backends" +) + + +class BackendFactory: + @staticmethod + def create_backend(name): + return BACKEND_REGISTRY[name] \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/backends/backend.py b/tools/infer_tool/ais_bench/infer/backends/backend.py new file mode 100644 index 0000000000000000000000000000000000000000..88476f05d171ca5e5468b2ee715d4b54274855a3 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/backends/backend.py @@ -0,0 +1,123 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from __future__ import annotations + +from abc import ABC, abstractmethod +from typing import List, Any, Iterable, Union + +import attrs + + +@attrs.define +class AccuracyResult: + output: Any = None + label: Any = None + prediction: Any = None + + +@attrs.define +class PerformanceStats: + min: float = None + max: float = None + mean: float = None + median: float = None + percentile: float = None + + +@attrs.define +class PerformanceResult: + h2d_latency: PerformanceStats = None + compute_time: PerformanceStats = None + d2h_latency: PerformanceStats = None + host_wall_time: float = None + throughput: float = None + + +@attrs.define +class InferenceTrace: + h2d_start: float = None + h2d_end: float = None + compute_start: float = None + compute_end: float = None + d2h_start: float = None + d2h_end: float = None + + +class Backend(ABC): + """ + Backend interface + """ + + @property + @abstractmethod + def name(self) -> str: + """ + Each of the subclasses must implement this. + This is called to return the name of backend. + """ + + @property + def model_extension(self) -> str: + return "model" + + def initialize(self) -> bool: + """ + init the resource of backend + """ + return True + + def finalize(self) -> None: + """ + release the resource of backend + """ + pass + + @abstractmethod + def load(self, model_path: str) -> Backend: + """ + Each of the subclases must implement this. + This is called to load a model. + """ + + @abstractmethod + def warm_up(self, dataloader: Iterable, iterations: int = 100) -> None: + """ + Each of the subclases must implement this. + This is called to warmup. + """ + + @abstractmethod + def predict( + self, dataloader: Iterable + ) -> Union[List[AccuracyResult], None]: + """ + Each of the subclasses must implement this. + This is called to inference a model + """ + + @abstractmethod + def build(self) -> None: + """ + Each of the subclasses must implement this. + This is called to build a model + """ + + @abstractmethod + def get_perf(self) -> PerformanceResult: + """ + Each of the subclasses must implement this. + This is called to get the performance of the model inference. + """ \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/backends/backend_trtexec.py b/tools/infer_tool/ais_bench/infer/backends/backend_trtexec.py new file mode 100644 index 0000000000000000000000000000000000000000..d1d92367c0753a9687befe66c7f52845051fae43 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/backends/backend_trtexec.py @@ -0,0 +1,154 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from __future__ import annotations + +import os +import sys +import logging +import subprocess +import re +from typing import Iterable, List, Dict, Any + +from ais_bench.infer.backends import backend, BACKEND_REGISTRY +from ais_bench.infer.backends.backend import AccuracyResult, PerformanceStats, PerformanceResult, InferenceTrace +from ais_bench.infer.common.utils import logger + + +class TrtexecConfig(object): + def __init__(self): + self.iterations = None + self.warmup = None + self.duration = None + self.batch = None + self.device = None + + +logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='[%(levelname)s] %(message)s') +logger = logging.getLogger(__name__) + + +@BACKEND_REGISTRY.register("trtexec") +class BackendTRTExec(backend.Backend): + def __init__(self, config: Any = None) -> None: + super(BackendTRTExec, self).__init__() + self.config = TrtexecConfig() + self.convert_config(config) + self.model_path = "" + self.output_log = "" + self.trace = InferenceTrace() + + @property + def name(self) -> str: + return "trtexec" + + @property + def model_extension(self) -> str: + return "plan" + + def convert_config(self, config): + if config.loop is not None: + self.config.iterations = config.loop + if config.warmup_count is not None: + self.config.warmup_count = config.warmup_count + if config.batchsize is not None: + self.config.batch = config.batchsize + if config.device is not None: + self.config.device = config.device + + def load( + self, model_path: str, inputs: list = None, outputs: list = None + ) -> BackendTRTExec: + if os.path.exists(model_path): + logger.info("Load engine from file {}".format(model_path)) + self.model_path = model_path + else: + raise Exception("{} not exit".format(model_path)) + return self + + def parse_perf(self, data: List) -> PerformanceStats: + stats = PerformanceStats() + stats.min = float(data[0]) + stats.max = float(data[1]) + stats.mean = float(data[2]) + stats.median = float(data[3]) + stats.percentile = float(data[4]) + return stats + + def parse_log(self, log: str) -> PerformanceResult: + performance = PerformanceResult() + log_list = log.splitlines() + pattern_1 = re.compile(r"(?<=: )\d+\.?\d*") + pattern_2 = re.compile(r"(?<== )\d+\.?\d*") + for line in log_list: + if "Throughput" in line: + throughput = pattern_1.findall(line) + performance.throughput = float(throughput[0]) + elif "H2D Latency" in line: + h2d_latency = pattern_2.findall(line) + performance.h2d_latency = self.parse_perf(h2d_latency) + elif "GPU Compute Time: min" in line: + compute_time = pattern_2.findall(line) + performance.compute_time = self.parse_perf(compute_time) + elif "D2H Latency" in line: + d2h_latency = pattern_2.findall(line) + performance.d2h_latency = self.parse_perf(d2h_latency) + elif "Total Host Walltime" in line: + total_host_time = pattern_1.findall(line) + performance.host_wall_time = float(total_host_time[0]) + return performance + + def warm_up(self, dataloader: Iterable, iterations: int = 100) -> None: + pass + + def predict(self, dataloader: Iterable) -> List[AccuracyResult]: + pass + + def build(self) -> None: + pass + + def get_perf(self) -> PerformanceResult: + return self.parse_log(self.output_log) + + def run(self): + command = [ + "trtexec", + f"--onnx={self.model_path}", + f"--fp16", + ] + if self.config.duration is not None: + command.append(f"--duration={self.config.duration}") + if self.config.device is not None: + command.append(f"--device={self.config.device}") + if self.config.iterations is not None: + command.append(f"--iterations={self.config.iterations}") + if self.config.warmup is not None: + command.append(f"--warmUp={self.config.warmup}") + if self.config.batch is not None: + command.append(f"--batch={self.config.batch}") + + logger.info("Trtexec Build command: " + " ".join(command)) + process = subprocess.Popen( + command, stdout=subprocess.PIPE, shell=False + ) + + while process.poll() is None: + line = process.stdout.readline() + self.output_log += line.decode() + line = line.strip() + if line: + logger.info(line.decode()) + + return [] \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/common/__init__.py b/tools/infer_tool/ais_bench/infer/common/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/tools/infer_tool/ais_bench/infer/common/io_operations.py b/tools/infer_tool/ais_bench/infer/common/io_operations.py new file mode 100644 index 0000000000000000000000000000000000000000..f0e2be562b2619aa558bd53b5ba73927777ee3be --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/common/io_operations.py @@ -0,0 +1,339 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import os +import random +import time +import numpy as np + +from ais_bench.infer.summary import summary +from ais_bench.infer.common.utils import ( + get_file_content, + get_file_datasize, + get_fileslist_from_dir, + list_split, + logger, + save_data_to_files, +) + +PURE_INFER_FAKE_FILE = "pure_infer_data" +PURE_INFER_FAKE_FILE_ZERO = "pure_infer_data_zero" +PURE_INFER_FAKE_FILE_RANDOM = "pure_infer_data_random" +PADDING_INFER_FAKE_FILE = "padding_infer_fake_file" + + +def convert_real_files(files): + real_files = [] + for file in files: + if file == PURE_INFER_FAKE_FILE: + raise RuntimeError("not support pure infer") + elif file.endswith(".npy") or file.endswith(".NPY"): + raise RuntimeError("not support npy file:{}".format(file)) + elif file == PADDING_INFER_FAKE_FILE: + real_files.append(files[0]) + else: + real_files.append(file) + return real_files + + +def get_pure_infer_data(size, pure_data_type): + lst = [] + if pure_data_type == "random": + # random value from [0, 255] + lst = [random.randrange(0, 256) for _ in range(size)] + else: + # zero value, default + lst = [0 for _ in range(size)] + + barray = bytearray(lst) + ndata = np.frombuffer(barray, dtype=np.uint8) + return ndata + + +# get numpy array from files list combile all files +def get_narray_from_files_list(files_list, size, pure_data_type, no_combine_tensor_mode=False): + ndatalist = [] + file_path_switch = { + PURE_INFER_FAKE_FILE: pure_data_type, + PURE_INFER_FAKE_FILE_ZERO: "zero", + PURE_INFER_FAKE_FILE_RANDOM: "random", + } + for i, file_path in enumerate(files_list): + logger.debug("get tensor from filepath:{} i:{} of all:{}".format(file_path, i, len(files_list))) + if file_path_switch.get(file_path) is not None: + ndata = get_pure_infer_data(size, file_path_switch.get(file_path)) + elif file_path == PADDING_INFER_FAKE_FILE: + logger.debug("padding file use fileslist[0]:{}".format(files_list[0])) + ndata = get_file_content(files_list[0]) + elif file_path is None or not os.path.exists(file_path): + logger.error('filepath:{} not valid'.format(file_path)) + raise RuntimeError() + else: + ndata = get_file_content(file_path) + ndatalist.append(ndata) + if len(ndatalist) == 1: + return ndatalist[0] + else: + ndata = np.concatenate(ndatalist) + if not no_combine_tensor_mode and ndata.nbytes != size: + logger.error('ndata size:{} not match {}'.format(ndata.nbytes, size)) + raise RuntimeError() + return ndata + + +# get tensors from files list combile all files +def get_tensor_from_files_list(files_list, session, size, pure_data_type, no_combine_tensor_mode=False): + ndata = get_narray_from_files_list(files_list, size, pure_data_type, no_combine_tensor_mode) + tensor = session.create_tensor_from_arrays_to_device(ndata) + return tensor + + +# Obtain filesperbatch runcount information according to file information and input description information +# The strategy is as follows: Judge according to the realsize and file size of input 0. If the judgment fails, +# you need to force the desired value to be set +def get_files_count_per_batch(intensors_desc, fileslist, no_combine_tensor_mode=False): + # get filesperbatch + filesize = get_file_datasize(fileslist[0][0]) + tensorsize = intensors_desc[0].realsize + if no_combine_tensor_mode: + files_count_per_batch = 1 + else: + if filesize == 0 or tensorsize % filesize != 0: + logger.error('arg0 tensorsize: {} filesize: {} not match'.format(tensorsize, filesize)) + raise RuntimeError() + else: + files_count_per_batch = (int)(tensorsize / filesize) + if files_count_per_batch == 0: + logger.error('files count per batch is zero') + raise RuntimeError() + runcount = math.ceil(len(fileslist[0]) / files_count_per_batch) + + logger.info( + "get filesperbatch files0 size:{} tensor0size:{} filesperbatch:{} runcount:{}".format( + filesize, tensorsize, files_count_per_batch, runcount + ) + ) + return files_count_per_batch, runcount + + +# Obtain tensor information and files information according to the input filelist. Create intensor form files list +# len(files_list) should equal len(intensors_desc) +def create_infileslist_from_fileslist(fileslist, intensors_desc, no_combine_tensor_mode=False): + if len(intensors_desc) != len(fileslist): + logger.error('fileslist:{} intensor:{} not match'.format(len(fileslist), len(intensors_desc))) + raise RuntimeError() + files_count_per_batch, runcount = get_files_count_per_batch(intensors_desc, fileslist, no_combine_tensor_mode) + + files_perbatch_list = [ + list(list_split(fileslist[j], files_count_per_batch, PADDING_INFER_FAKE_FILE)) + for j in range(len(intensors_desc)) + ] + + infileslist = [] + for i in range(runcount): + infiles = [] + for j in range(len(intensors_desc)): + logger.debug( + "create infileslist i:{} j:{} runcount:{} lists:{} filesPerPatch:{}".format( + i, j, runcount, files_perbatch_list[j][i], files_count_per_batch + ) + ) + infiles.append(files_perbatch_list[j][i]) + infileslist.append(infiles) + return infileslist + + +# outapi. Obtain tensor information and files information according to the input filelist. +# Create intensor form files list +def create_intensors_from_infileslist( + infileslist, intensors_desc, session, pure_data_type, no_combine_tensor_mode=False +): + intensorslist = [] + for infiles in infileslist: + intensors = [] + for files, intensor_desc in zip(infiles, intensors_desc): + tensor = get_tensor_from_files_list( + files, session, intensor_desc.realsize, pure_data_type, no_combine_tensor_mode + ) + intensors.append(tensor) + intensorslist.append(intensors) + return intensorslist + + +def check_input_parameter(inputs_list, intensors_desc): + if len(inputs_list) == 0: + logger.error("Invalid args. Input args are empty") + raise RuntimeError() + if os.path.isfile(inputs_list[0]): + for index, file_path in enumerate(inputs_list): + realpath = os.readlink(file_path) if os.path.islink(file_path) else file_path + if not os.path.isfile(realpath): + logger.error( + "Invalid input args.--input:{} input[{}]:{} {} not exist".format( + inputs_list, index, file_path, realpath + ) + ) + raise RuntimeError() + elif os.path.isdir(inputs_list[0]): + if len(inputs_list) != len(intensors_desc): + logger.error( + "Invalid args. args input dir num:{0} not equal to model inputs num:{1}".format( + len(inputs_list), len(intensors_desc) + ) + ) + raise RuntimeError() + + for dir_path in inputs_list: + real_dir_path = os.readlink(dir_path) if os.path.islink(dir_path) else dir_path + if not os.path.isdir(real_dir_path): + logger.error("Invalid args. {} of input args is not a real dir path".format(real_dir_path)) + raise RuntimeError() + else: + logger.error("Invalid args. {} of --input is invalid".format(inputs_list[0])) + raise RuntimeError() + + +# outapi. get by input parameters of inputs_List. +def create_infileslist_from_inputs_list(inputs_list, intensors_desc, no_combine_tensor_mode=False): + check_input_parameter(inputs_list, intensors_desc) + fileslist = [] + inputlistcount = len(inputs_list) + intensorcount = len(intensors_desc) + if os.path.isfile(inputs_list[0]): + chunks = inputlistcount // intensorcount + fileslist = list(list_split(inputs_list, chunks, PADDING_INFER_FAKE_FILE)) + logger.debug( + "create intensors list file type inlistcount:{} intensorcont:{} chunks:{} files_size:{}".format( + inputlistcount, intensorcount, chunks, len(fileslist) + ) + ) + elif os.path.isdir(inputs_list[0]) and inputlistcount == intensorcount: + fileslist = [get_fileslist_from_dir(dir) for dir in inputs_list] + logger.debug( + "create intensors list dictionary type inlistcount:{} intensorcont:{} files_size:{}".format( + inputlistcount, intensorcount, len(fileslist) + ) + ) + else: + logger.error( + 'create intensors list filelists:{} intensorcont:{} error create'.format(inputlistcount, intensorcount) + ) + raise RuntimeError() + + infileslist = create_infileslist_from_fileslist(fileslist, intensors_desc, no_combine_tensor_mode) + if len(infileslist) == 0: + logger.error('create_infileslist_from_fileslist return infileslist size: {}'.format(len(infileslist))) + raise RuntimeError() + + return infileslist + + +def check_pipeline_fileslist_match_intensors(fileslist, intensors_desc): + # check intensor amount matched + if len(intensors_desc) != len(fileslist): + logger.error('fileslist:{} intensor:{} not match'.format(len(fileslist), len(intensors_desc))) + raise RuntimeError() + # check intensor size matched + for i, files in enumerate(fileslist): + filesize = get_file_datasize(files[0]) + tensorsize = intensors_desc[i].realsize + auto_mode = False + # auto_dim_mode & auto_shape_mode are exceptional cases + if intensors_desc[i].realsize == intensors_desc[i].size: + if any(dim <= 0 for dim in intensors_desc[i].shape): + auto_mode = True + if filesize != tensorsize and not auto_mode: + logger.error(f'tensor_num:{i} tensorsize:{tensorsize} filesize:{filesize} not match') + raise RuntimeError() + + +# 不组batch的情况 +def create_pipeline_fileslist_from_inputs_list(inputs_list, intensors_desc): + check_input_parameter(inputs_list, intensors_desc) + fileslist = [] + inputlistcount = len(inputs_list) + intensorcount = len(intensors_desc) + if os.path.isfile(inputs_list[0]): + chunks = inputlistcount // intensorcount + fileslist = list(list_split(inputs_list, chunks, PADDING_INFER_FAKE_FILE)) + logger.debug( + f"create intensors list file type inlistcount:{inputlistcount} \ + intensorcont:{intensorcount} chunks:{chunks} files_size:{len(fileslist)}" + ) + elif os.path.isdir(inputs_list[0]) and inputlistcount == intensorcount: + fileslist = [get_fileslist_from_dir(dir_) for dir_ in inputs_list] + logger.debug( + f"create intensors list dictionary type inlistcount:{inputlistcount} \ + intensorcont:{intensorcount} files_size:{len(fileslist)}" + ) + else: + logger.error('create intensors list filelists:{inputlistcount} intensorcont:{intensorcount} error create') + raise RuntimeError() + try: + check_pipeline_fileslist_match_intensors(fileslist, intensors_desc) + except Exception as err: + logger.error("fileslist and intensors not matched") + raise RuntimeError from err + infileslist = list(zip(*fileslist)) + return infileslist + + +def save_tensors_to_file(outputs, output_prefix, infiles_paths, outfmt, index, output_batchsize_axis): + files_count_perbatch = len(infiles_paths[0]) + infiles_perbatch = np.transpose(infiles_paths) + for i, out in enumerate(outputs): + ndata = np.array(out) + if output_batchsize_axis >= len(ndata.shape): + logger.error( + "error i:{0} ndata.shape:{1} len:{2} <= output_batchsize_axis:{3} is invalid".format( + i, ndata.shape, len(ndata.shape), output_batchsize_axis + ) + ) + raise RuntimeError() + if files_count_perbatch == 1 or ndata.shape[output_batchsize_axis] % files_count_perbatch == 0: + subdata = np.array_split(ndata, files_count_perbatch, output_batchsize_axis) + for j in range(files_count_perbatch): + sample_id = index * files_count_perbatch + j + if infiles_perbatch[j][0] == PADDING_INFER_FAKE_FILE: + logger.debug( + "sampleid:{} i:{} infiles:{} is padding fake file so continue".format( + sample_id, i, infiles_perbatch[j] + ) + ) + continue + file_path = os.path.join( + output_prefix, + "{}_{}.{}".format(os.path.basename(infiles_perbatch[j][0]).split('.')[0], i, outfmt.lower()), + ) + summary.add_sample_id_infiles(sample_id, infiles_perbatch[j]) + logger.debug( + "save func: sampleid:{} i:{} infiles:{} outfile:{} fmt:{} axis:{}".format( + sample_id, i, infiles_perbatch[j], file_path, outfmt, output_batchsize_axis + ) + ) + summary.append_sample_id_outfile(sample_id, file_path) + save_data_to_files(file_path, subdata[j]) + else: + logger.error( + 'save out files error array shape:{} filesinfo:{} files_count_perbatch:{} ndata.shape\ + {}:{}'.format( + ndata.shape, + infiles_paths, + files_count_perbatch, + output_batchsize_axis, + ndata.shape[output_batchsize_axis], + ) + ) + raise RuntimeError() diff --git a/tools/infer_tool/ais_bench/infer/common/miscellaneous.py b/tools/infer_tool/ais_bench/infer/common/miscellaneous.py new file mode 100644 index 0000000000000000000000000000000000000000..21ce01c35eb9c3a02042c889a84d942eba083691 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/common/miscellaneous.py @@ -0,0 +1,276 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import sys +import stat +import subprocess +import json +import itertools +import numpy as np + +from ais_bench.infer.common.utils import logger +from ais_bench.infer.common.path_security_check import ms_open, MAX_SIZE_LIMITE_CONFIG_FILE, MAX_SIZE_LIMITE_NORMAL_FILE +from ais_bench.infer.args_adapter import AISBenchInferArgsAdapter + +PERMISSION_DIR = 0o750 + +ACL_JSON_CMD_LIST = [ + "output", + "storage_limit", + "ascendcl", + "runtime_api", + "hccl", + "task_time", + "aicpu", + "aic_metrics", + "l2", + "sys_hardware_mem_freq", + "lcc_profiling", + "dvpp_freq", + "host_sys", + "host_sys_usage", + "host_sys_usage_freq", + "sys_interconnection_freq", + "msproftx", +] + + +def get_modules_version(name): + try: + import pkg_resources + except ImportError as err: + raise Exception("importerror") from err + pkg = pkg_resources.get_distribution(name) + return pkg.version + + +def version_check(args): + try: + aclruntime_version = get_modules_version('aclruntime') + except Exception: + url = 'https://gitee.com/ascend/tools/tree/master/ais-bench_workload/tool/ais_bench' + logger.warning(f"can't find aclruntime, please visit {url} to install ais_bench(benchmark)" + "to install") + args.run_mode = "tensor" + if aclruntime_version != "0.0.2": + logger.warning( + f"aclruntime{aclruntime_version} version is lower please update \ + aclruntime follow any one method" + ) + # set old run mode to run ok + args.run_mode = "tensor" + + +def get_model_name(model): + path_list = model.split('/') + return path_list[-1][:-3] + + +def check_valid_acl_json_for_dump(acl_json_path, model): + with ms_open(acl_json_path, mode="r", max_size=MAX_SIZE_LIMITE_CONFIG_FILE) as f: + acl_json_dict = json.load(f) + model_name_correct = get_model_name(model) + if acl_json_dict.get("dump") is not None: + # check validity of dump_list (model_name) + dump_list_val = acl_json_dict["dump"].get("dump_list") + if dump_list_val is not None: + if dump_list_val == [] or dump_list_val[0].get("model_name") != model_name_correct: + logger.warning( + "dump failed, 'model_name' is not set or set incorrectly. correct" + "'model_name' should be {}".format(model_name_correct) + ) + else: + logger.warning("dump failed, acl.json need to set 'dump_list' attribute") + # check validity of dump_path + dump_path_val = acl_json_dict["dump"].get("dump_path") + if dump_path_val is not None: + if os.path.isdir(dump_path_val) and os.access(dump_path_val, os.R_OK) and os.access(dump_path_val, os.W_OK): + pass + else: + logger.warning("dump failed, 'dump_path' not exists or has no read/write permission") + else: + logger.warning("dump failed, acl.json need to set 'dump_path' attribute") + # check validity of dump_op_switch + dump_op_switch_val = acl_json_dict["dump"].get("dump_op_switch") + if dump_op_switch_val is not None and dump_op_switch_val not in {"on", "off"}: + logger.warning("dump failed, 'dump_op_switch' need to be set as 'on' or 'off'") + # check validity of dump_mode + dump_mode_val = acl_json_dict["dump"].get("dump_mode") + if dump_mode_val is not None and dump_mode_val not in {"input", "output", "all"}: + logger.warning("dump failed, 'dump_mode' need to be set as 'input', 'output' or 'all'") + return + + +def get_acl_json_path(args): + """ + get acl json path. when args.profiler is true or args.dump is True, create relative acl.json , + default current folder + """ + if args.acl_json_path is not None: + check_valid_acl_json_for_dump(args.acl_json_path, args.model) + return args.acl_json_path + if not args.profiler and not args.dump: + return None + + output_json_dict = {} + if args.profiler: + out_profiler_path = os.path.join(args.output, "profiler") + + if not os.path.exists(out_profiler_path): + os.makedirs(out_profiler_path, PERMISSION_DIR) + output_json_dict = {"profiler": {"switch": "on", "aicpu": "on", "output": out_profiler_path, "aic_metrics": ""}} + elif args.dump: + out_dump_path = os.path.join(args.output, "dump") + + if not os.path.exists(out_dump_path): + os.makedirs(out_dump_path, PERMISSION_DIR) + + model_name = args.model.split("/")[-1] + output_json_dict = { + "dump": { + "dump_path": out_dump_path, + "dump_mode": "all", + "dump_list": [{"model_name": model_name.split('.')[0]}], + } + } + + out_json_file_path = os.path.join(args.output, "acl.json") + + OPEN_FLAGS = os.O_WRONLY | os.O_CREAT | os.O_TRUNC + OPEN_MODES = stat.S_IWUSR | stat.S_IRUSR + with ms_open(out_json_file_path, mode="w") as f: + json.dump(output_json_dict, f, indent=4, separators=(", ", ": "), sort_keys=True) + return out_json_file_path + + +def get_batchsize(session, args): + intensors_desc = session.get_inputs() + batchsize = intensors_desc[0].shape[0] + if args.dym_batch != 0: + batchsize = int(args.dym_batch) + elif args.dym_dims is not None or args.dym_shape is not None: + instr = args.dym_dims if args.dym_dims is not None else args.dym_shape + elems = instr.split(';') + for elem in elems: + tmp_idx = elem.rfind(':') + name = elem[:tmp_idx] + shapestr = elem[tmp_idx + 1 :] + if name == intensors_desc[0].name: + batchsize = int(shapestr.split(',')[0]) + return batchsize + + +def get_range_list(ranges): + elems = ranges.split(';') + info_list = [] + for elem in elems: + shapes = [] + tmp_idx = elem.rfind(':') + name = elem[:tmp_idx] + shapestr = elem[tmp_idx + 1 :] + for content in shapestr.split(','): + step = 1 + if '~' in content: + start = int(content.split('~')[0]) + end = int(content.split('~')[1]) + step = int(content.split('~')[2]) if len(content.split('~')) == 3 else 1 + ranges = [str(i) for i in range(start, end + 1, step)] + elif '-' in content: + ranges = content.split('-') + else: + start = int(content) + ranges = [str(start)] + shapes.append(ranges) + logger.debug("content:{} get range{}".format(content, ranges)) + shape_list = [','.join(s) for s in list(itertools.product(*shapes))] + info = ["{}:{}".format(name, s) for s in shape_list] + info_list.append(info) + logger.debug("name:{} shapes:{} info:{}".format(name, shapes, info)) + + res = [';'.join(s) for s in list(itertools.product(*info_list))] + logger.debug("range list:{}".format(res)) + return res + + +# get dymshape list from input_ranges +# input_ranges can be a string like "name1:1,3,224,224;name2:1,600" or file +def get_dymshape_list(input_ranges): + ranges_list = [] + if os.path.isfile(input_ranges): + with ms_open(input_ranges, mode="rt", max_size=MAX_SIZE_LIMITE_NORMAL_FILE, encoding='utf-8') as finfo: + line = finfo.readline() + while line: + line = line.rstrip('\n') + ranges_list.append(line) + line = finfo.readline() + else: + ranges_list.append(input_ranges) + + dymshape_list = [] + for ranges in ranges_list: + dymshape_list.extend(get_range_list(ranges)) + return dymshape_list + + +# get throughput from out log +def get_throughtput_from_log(out_log): + log_list = out_log.split('\n') + for log_txt in log_list: + if "throughput" in log_txt: + throughput = float(log_txt.split(' ')[-1]) + return "OK", throughput + return "Failed", 0 + + +def regenerate_dymshape_cmd(args: AISBenchInferArgsAdapter, dym_shape): + args_dict = args.get_all_args_dict() + cmd = sys.executable + " -m ais_bench" + for key, value in args_dict.items(): + if key == '--dymShape_range': + continue + if key == '--dymShape': + cmd = cmd + " " + f"{key}={dym_shape}" + continue + if value: + cmd = cmd + " " + f"{key}={value}" + cmd_list = cmd.split(' ') + return cmd_list + + +def dymshape_range_run(args: AISBenchInferArgsAdapter): + dymshape_list = get_dymshape_list(args.dym_shape_range) + results = [] + for dymshape in dymshape_list: + cmd = regenerate_dymshape_cmd(args, dymshape) + result = {"dymshape": dymshape, "cmd": cmd, "result": "Failed", "throughput": 0} + logger.debug("cmd:{}".format(cmd)) + p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) + stdout, _ = p.communicate(timeout=10) + out_log = stdout.decode('utf-8') + print(out_log) # show original log of cmd + result["result"], result["throughput"] = get_throughtput_from_log(out_log) + logger.info("dymshape:{} end run result:{}".format(dymshape, result["result"])) + results.append(result) + + tlist = [result["throughput"] for result in results if result["result"] == "OK"] + logger.info("-----------------dyshape_range Performance Summary------------------") + logger.info("run_count:{} success_count:{} avg_throughput:{}".format(len(results), len(tlist), np.mean(tlist))) + results.sort(key=lambda x: x['throughput'], reverse=True) + for i, result in enumerate(results): + logger.info( + "{} dymshape:{} result:{} throughput:{}".format( + i, result["dymshape"], result["result"], result["throughput"] + ) + ) + logger.info("------------------------------------------------------") diff --git a/tools/infer_tool/ais_bench/infer/common/path_security_check.py b/tools/infer_tool/ais_bench/infer/common/path_security_check.py new file mode 100644 index 0000000000000000000000000000000000000000..81ef7f4c98260f3be7f2bbcf05098e34df07f19a --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/common/path_security_check.py @@ -0,0 +1,293 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# this file is as same as components/utils/file_opem_check.py, because benchmark might be install without ait + +import os +import sys +import stat +import re +import logging + + +MAX_SIZE_UNLIMITE = -1 # 不限制,必须显式表示不限制,读取必须传入 +MAX_SIZE_LIMITE_CONFIG_FILE = 10 * 1024 * 1024 # 10M 普通配置文件,可以根据实际要求变更 +MAX_SIZE_LIMITE_NORMAL_FILE = 4 * 1024 * 1024 * 1024 # 4G 普通模型文件,可以根据实际要求变更 +MAX_SIZE_LIMITE_MODEL_FILE = 100 * 1024 * 1024 * 1024 # 100G 超大模型文件,需要确定能处理大文件,可以根据实际要求变更 + +PATH_WHITE_LIST_REGEX_WIN = re.compile(r"[^_:\\A-Za-z0-9/.-]") +PATH_WHITE_LIST_REGEX = re.compile(r"[^_A-Za-z0-9/.-]") + +PERMISSION_NORMAL = 0o640 # 普通文件 +PERMISSION_KEY = 0o600 # 密钥文件 +READ_FILE_NOT_PERMITTED_STAT = stat.S_IWGRP | stat.S_IWOTH +WRITE_FILE_NOT_PERMITTED_STAT = stat.S_IWGRP | stat.S_IWOTH | stat.S_IROTH | stat.S_IXOTH + +SOLUTION_LEVEL = 35 +SOLUTION_LEVEL_WIN = 45 +logging.addLevelName(SOLUTION_LEVEL, "\033[1;32m" + "SOLUTION" + "\033[0m") # green [SOLUTION] +logging.addLevelName(SOLUTION_LEVEL_WIN, "SOLUTION_WIN") +logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='[%(levelname)s] %(message)s') +logger = logging.getLogger(__name__) + + +SOLUTION_BASE_URL = 'https://gitee.com/ascend/ait/wikis/ait_security_error_log_solution' +SOFT_LINK_SUB_URL = '/soft_link_error_log_solution' +PATH_LENGTH_SUB_URL = '/path_length_overflow_error_log_solution' +OWNER_SUB_URL = '/owner_or_ownergroup_error_log_solution' +PERMISSION_SUB_URL = '/path_permission_error_log_solution' +ILLEGAL_CHAR_SUB_URL = '/path_contain_illegal_char_error_log_solution' + + +def solution_log(content): + logger.log(SOLUTION_LEVEL, f"visit \033[1;32m {content} \033[0m for detailed solution") # green content + + +def solution_log_win(content): + logger.log(SOLUTION_LEVEL_WIN, f"visit {content} for detailed solution") + + +def is_legal_path_length(path): + if len(path) > 4096 and not sys.platform.startswith("win"): # linux total path length limit + logger.error(f"file total path{path} length out of range (4096), please check the file(or directory) path") + solution_log(SOLUTION_BASE_URL + PATH_LENGTH_SUB_URL) + return False + + if len(path) > 260 and sys.platform.startswith("win"): # windows total path length limit + logger.error(f"file total path{path} length out of range (260), please check the file(or directory) path") + solution_log_win(SOLUTION_BASE_URL + PATH_LENGTH_SUB_URL) + return False + + dirnames = path.split("/") + for dirname in dirnames: + if len(dirname) > 255: # linux single file path length limit + logger.error(f"file name{dirname} length out of range (255), please check the file(or directory) path") + solution_log(SOLUTION_BASE_URL + PATH_LENGTH_SUB_URL) + return False + return True + + +def is_match_path_white_list(path): + if PATH_WHITE_LIST_REGEX.search(path) and not sys.platform.startswith("win"): + logger.error(f"path:{path} contains illegal char, legal chars include A-Z a-z 0-9 _ - / .") + solution_log(SOLUTION_BASE_URL + ILLEGAL_CHAR_SUB_URL) + return False + if PATH_WHITE_LIST_REGEX_WIN.search(path) and sys.platform.startswith("win"): + logger.error(f"path:{path} contains illegal char, legal chars include A-Z a-z 0-9 _ - / . : \\") + solution_log_win(SOLUTION_BASE_URL + ILLEGAL_CHAR_SUB_URL) + return False + return True + + +def is_legal_args_path_string(path): + # only check path string + if not path: + return True + if not is_legal_path_length(path): + return False + if not is_match_path_white_list(path): + return False + return True + + +class OpenException(Exception): + pass + + +class FileStat: + def __init__(self, file) -> None: + if not is_legal_path_length(file) or not is_match_path_white_list(file): + raise OpenException(f"create FileStat failed") + self.file = file + self.is_file_exist = os.path.exists(file) + if self.is_file_exist: + self.file_stat = os.stat(file) + self.realpath = os.path.realpath(file) + else: + self.file_stat = None + + @property + def is_exists(self): + return self.is_file_exist + + @property + def is_softlink(self): + return os.path.islink(self.file) if self.file_stat else False + + @property + def is_file(self): + return stat.S_ISREG(self.file_stat.st_mode) if self.file_stat else False + + @property + def is_dir(self): + return stat.S_ISDIR(self.file_stat.st_mode) if self.file_stat else False + + @property + def file_size(self): + return self.file_stat.st_size if self.file_stat else 0 + + @property + def permission(self): + return stat.S_IMODE(self.file_stat.st_mode) if self.file_stat else 0o777 + + @property + def owner(self): + return self.file_stat.st_uid if self.file_stat else -1 + + @property + def group_owner(self): + return self.file_stat.st_gid if self.file_stat else -1 + + @property + def is_owner(self): + return self.owner == (os.geteuid() if hasattr(os, "geteuid") else 0) + + @property + def is_group_owner(self): + return self.group_owner in (os.getgroups() if hasattr(os, "getgroups") else [0]) + + @property + def is_user_or_group_owner(self): + return self.is_owner or self.is_group_owner + + @property + def is_user_and_group_owner(self): + return self.is_owner and self.is_group_owner + + def is_basically_legal(self, perm='none'): + if sys.platform.startswith("win"): + return self.check_windows_permission(perm) + else: + return self.check_linux_permission(perm) + + def check_linux_permission(self, perm='none'): + if not self.is_exists and perm != 'write': + logger.error(f"path: {self.file} not exist, please check if file or dir is exist") + return False + if self.is_softlink: + logger.error(f"path :{self.file} is a soft link, not supported, please import file(or directory) directly") + solution_log(SOLUTION_BASE_URL + SOFT_LINK_SUB_URL) + return False + if not self.is_user_or_group_owner and self.is_exists: + logger.error( + f"current user isn't path:{self.file}'s owner or ownergroup, make sure current user belong to file(or directory)'s owner or ownergroup" + ) + solution_log(SOLUTION_BASE_URL + OWNER_SUB_URL) + return False + if perm == 'read': + if self.permission & READ_FILE_NOT_PERMITTED_STAT > 0: + logger.error( + f"The file {self.file} is group writable, or is others writable, as import file(or directory), " + "permission should not be over 0o755(rwxr-xr-x)" + ) + solution_log(SOLUTION_BASE_URL + PERMISSION_SUB_URL) + return False + if not os.access(self.realpath, os.R_OK) or self.permission & stat.S_IRUSR == 0: + logger.error( + f"Current user doesn't have read permission to the file {self.file}, as import file(or directory), " + "permission should be at least 0o400(r--------) " + ) + solution_log(SOLUTION_BASE_URL + PERMISSION_SUB_URL) + return False + elif perm == 'write' and self.is_exists: + if self.permission & WRITE_FILE_NOT_PERMITTED_STAT > 0: + logger.error( + f"The file {self.file} is group writable, or is others writable, as export file(or directory), " + "permission should not be over 0o750(rwxr-x---)" + ) + solution_log(SOLUTION_BASE_URL + PERMISSION_SUB_URL) + return False + if not os.access(self.realpath, os.W_OK): + logger.error( + f"Current user doesn't have write permission to the file {self.file}, as export file(or directory), " + "permission should be at least 0o200(-w-------) " + ) + solution_log(SOLUTION_BASE_URL + PERMISSION_SUB_URL) + return False + return True + + def check_windows_permission(self, perm='none'): + if not self.is_exists and perm != 'write': + logger.error(f"path: {self.file} not exist, please check if file or dir is exist") + return False + if self.is_softlink: + logger.error(f"path :{self.file} is a soft link, not supported, please import file(or directory) directly") + solution_log(SOLUTION_BASE_URL + SOFT_LINK_SUB_URL) + return False + return True + + def is_legal_file_size(self, max_size): + if not self.is_file: + logger.error(f"path: {self.file} is not a file") + return False + if self.file_size > max_size: + logger.error(f"file_size:{self.file_size} byte out of max limit {max_size} byte") + return False + else: + return True + + def is_legal_file_type(self, file_types: list): + if not self.is_file and self.is_exists: + logger.error(f"path: {self.file} is not a file") + return False + for file_type in file_types: + if os.path.splitext(self.file)[1] == f".{file_type}": + return True + logger.error(f"path:{self.file}, file type not in {file_types}") + return False + + +def ms_open(file, mode="r", max_size=None, softlink=False, write_permission=PERMISSION_NORMAL, **kwargs): + file_stat = FileStat(file) + + if file_stat.is_exists and file_stat.is_dir: + raise OpenException(f"Expecting a file, but it's a folder. {file}") + + if "r" in mode: + if not file_stat.is_exists: + raise OpenException(f"No such file or directory {file}") + if max_size is None: + raise OpenException(f"Reading files must have a size limit control. {file}") + if max_size != MAX_SIZE_UNLIMITE and max_size < file_stat.file_size: + raise OpenException(f"The file size has exceeded the specifications and cannot be read. {file}") + + if "w" in mode: + if file_stat.is_exists and not file_stat.is_owner: + raise OpenException( + f"The file owner is inconsistent with the current process user and is not allowed to write. {file}" + ) + if file_stat.is_exists: + os.remove(file) + + if not softlink and file_stat.is_softlink: + raise OpenException(f"Softlink is not allowed to be opened. {file}") + + if "a" in mode: + if not file_stat.is_owner: + raise OpenException( + f"The file owner is inconsistent with the current process user and is not allowed to write. {file}" + ) + if file_stat.permission != (file_stat.permission & write_permission): + os.chmod(file, file_stat.permission & write_permission) + + flags = os.O_RDONLY + if "+" in mode: + flags = flags | os.O_RDWR + elif "w" in mode or "a" in mode or "x" in mode: + flags = flags | os.O_WRONLY + + if "w" in mode or "x" in mode: + flags = flags | os.O_TRUNC | os.O_CREAT + if "a" in mode: + flags = flags | os.O_APPEND | os.O_CREAT + return os.fdopen(os.open(file, flags, mode=write_permission), mode, **kwargs) diff --git a/tools/infer_tool/ais_bench/infer/common/utils.py b/tools/infer_tool/ais_bench/infer/common/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..a3a88f116ddb96bbcd3a2ac414a410c129e55e4b --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/common/utils.py @@ -0,0 +1,274 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import os +import sys +import stat +import re +import uuid +from pickle import NONE +import logging +from random import sample +from string import digits, ascii_uppercase, ascii_lowercase +import json +import shutil +import shlex +import subprocess +import numpy as np +from ais_bench.infer.common.path_security_check import ( + ms_open, + MAX_SIZE_LIMITE_NORMAL_FILE, + MAX_SIZE_LIMITE_CONFIG_FILE, + FileStat, + is_legal_args_path_string, +) + +logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='[%(levelname)s] %(message)s') +logger = logging.getLogger(__name__) + +PERMISSION_DIR = 0o750 +READ_WRITE_FLAGS = os.O_RDWR | os.O_CREAT +WRITE_FLAGS = os.O_WRONLY | os.O_CREAT | os.O_TRUNC +WRITE_MODES = stat.S_IWUSR | stat.S_IRUSR +MSACCUCMP_FILE_PATH = "tools/operator_cmp/compare/msaccucmp.py" +CANN_PATH = "/usr/local/Ascend/ascend-toolkit/latest" + + +# Split a List Into Even Chunks of N Elements +def list_split(list_a, n, padding_file): + for x in range(0, len(list_a), n): + every_chunk = list_a[x : n + x] + + if len(every_chunk) < n: + every_chunk = every_chunk + [padding_file for _ in range(n - len(every_chunk))] + yield every_chunk + + +def list_share(list_a, count, num, left): + head = 0 + for i in range(count): + if i < left: + every_chunk = list_a[head : head + num + 1] + head = head + num + 1 + else: + every_chunk = list_a[head : head + num] + head = head + num + yield every_chunk + + +def natural_sort(lst): + convert = lambda text: int(text) if text.isdigit() else text.lower() + alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)] + return sorted(lst, key=alphanum_key) + + +def get_fileslist_from_dir(dir_): + files_list = [] + + for f in os.listdir(dir_): + f_true_path = os.path.join(dir_, f) + f_stat = FileStat(f_true_path) + if not f_stat.is_basically_legal('read'): + raise RuntimeError(f'input data:{f_true_path} is illegal') + if f_stat.is_dir: + continue + if f.endswith(".npy") or f.endswith(".NPY") or f.endswith(".bin") or f.endswith(".BIN"): + files_list.append(os.path.join(dir_, f)) + + if len(files_list) == 0: + logger.error('{} of input args not find valid file,valid file format:[*.npy *.NPY *.bin *.BIN]'.format(dir_)) + raise RuntimeError() + files_list.sort() + return natural_sort(files_list) + + +def get_file_datasize(file_path): + if file_path.endswith(".NPY") or file_path.endswith(".npy"): + ndata = np.load(file_path) + return ndata.nbytes + else: + return os.path.getsize(file_path) + + +def get_file_content(file_path): + if file_path.endswith(".NPY") or file_path.endswith(".npy"): + return np.load(file_path) + else: + with ms_open(file_path, mode="rb", max_size=MAX_SIZE_LIMITE_NORMAL_FILE) as fd: + barray = fd.read() + return np.frombuffer(barray, dtype=np.int8) + + +def get_ndata_fmt(ndata): + if ndata.dtype == np.float32 or ndata.dtype == np.float16 or ndata.dtype == np.float64: + fmt = "%f" + else: + fmt = "%d" + return fmt + + +def save_data_to_files(file_path, ndata): + if file_path.endswith(".NPY") or file_path.endswith(".npy"): + with ms_open(file_path, mode="wb") as f: + np.save(f, ndata) + elif file_path.endswith(".TXT") or file_path.endswith(".txt"): + outdata = ndata.reshape(-1, ndata.shape[-1]) + fmt = get_ndata_fmt(outdata) + with ms_open(file_path, mode="wb") as f: + for i in range(outdata.shape[0]): + np.savetxt(f, np.c_[outdata[i]], fmt=fmt, newline=" ") + f.write(b"\n") + else: + with ms_open(file_path, mode="wb") as f: + ndata.tofile(f) + + +def create_fake_file_name(pure_data_type, index): + suffix = "_" + pure_data_type + "_" + str(index) + loop_max = 1000 + for _ in range(loop_max): + fname = os.path.join(os.getcwd(), "tmp-" + "".join(str(uuid.uuid4())) + suffix) + if not os.path.exists(fname): + return fname + raise RuntimeError(f'create_fake_file_name failed: inner error') + + +def get_dump_relative_paths(output_dir, timestamp): + if output_dir is None or timestamp is None: + return [] + dump_dir = os.path.join(output_dir, timestamp) + dump_relative_paths = [] + for subdir, _, files in os.walk(dump_dir): + if len(files) > 0: + dump_relative_paths.append(os.path.relpath(subdir, dump_dir)) + return dump_relative_paths + + +def get_msaccucmp_path(): + ascend_toolkit_path = os.environ.get("ASCEND_TOOLKIT_HOME") + if not is_legal_args_path_string(ascend_toolkit_path): + raise TypeError(f"ASCEND_TOOLKIT_HOME:{ascend_toolkit_path} is illegal") + if ascend_toolkit_path is None: + ascend_toolkit_path = CANN_PATH + msaccucmp_path = os.path.join(ascend_toolkit_path, MSACCUCMP_FILE_PATH) + return msaccucmp_path if os.path.exists(msaccucmp_path) else None + + +def make_dirs(path): + ret = 0 + if not os.path.exists(path): + try: + os.makedirs(path, PERMISSION_DIR) + except Exception as e: + logger.warning(f"make dir {path} failed") + ret = -1 + return ret + + +def create_tmp_acl_json(acl_json_path): + with ms_open(acl_json_path, mode="r", max_size=MAX_SIZE_LIMITE_CONFIG_FILE) as f: + acl_json_dict = json.load(f) + tmp_acl_json_path, real_dump_path, tmp_dump_path = None, None, None + + # create tmp acl.json path + acl_json_path_list = acl_json_path.split("/") + acl_json_path_list[-1] = str(uuid.uuid4()) + "_" + acl_json_path_list[-1] + tmp_acl_json_path = "/".join(acl_json_path_list) + + # change acl_json_dict + if acl_json_dict.get("dump") is not None and acl_json_dict["dump"].get("dump_path") is not None: + real_dump_path = acl_json_dict["dump"]["dump_path"] + dump_path_list = real_dump_path.split("/") + if dump_path_list[-1] == "": + dump_path_list.pop() + dump_path_list.append(str(uuid.uuid4())) + tmp_dump_path = "/".join(dump_path_list) + acl_json_dict["dump"]["dump_path"] = tmp_dump_path + if make_dirs(tmp_dump_path) != 0: + tmp_dump_path = None + os.remove(tmp_acl_json_path) + tmp_acl_json_path = None + + if tmp_acl_json_path is not None: + with ms_open(tmp_acl_json_path, mode="w") as f: + json.dump(acl_json_dict, f) + + return tmp_acl_json_path, real_dump_path, tmp_dump_path + + +def convert_helper(output_dir, timestamp): # convert bin file in src path and output the npy file in dest path + ''' + before: + output_dir--|--2023***2--... (原来可能存在的时间戳路径) + |--2023***3--... (原来可能存在的时间戳路径) + |--timestamp--... (移动过的bin file目录) + + after: + output_dir--|--2023***2--... (原来可能存在的时间戳路径) + |--2023***3--... (原来可能存在的时间戳路径) + |--timestamp--... (移动过的bin file目录) + |--timestamp_npy--... (转换后npy保存的目录) + ''' + dump_relative_paths = get_dump_relative_paths(output_dir, timestamp) + msaccucmp_path = get_msaccucmp_path() + python_path = sys.executable + if python_path is None: + logger.error("convert_helper failed: python executable is not found. NPY file transfer failed.") + return + if msaccucmp_path is None: + logger.error("convert_helper failed: msaccucmp.py is not found. NPY file transfer failed.") + return + if dump_relative_paths == []: + logger.error("convert_helper failed: dump_relative_paths is empty. NPY file transfer failed.") + return + for dump_relative_path in dump_relative_paths: + dump_npy_path = os.path.join(output_dir, timestamp + "_npy", dump_relative_path) + real_dump_path = os.path.join(output_dir, timestamp, dump_relative_path) + convert_cmd = f"{python_path} {msaccucmp_path} convert -d {real_dump_path} -out {dump_npy_path}" + convert_cmd_list = shlex.split(convert_cmd) + ret = subprocess.call(convert_cmd_list, shell=False) + if ret != 0: + logger.error(f"convert_helper failed: cmd {convert_cmd} execute failed") + + +def move_subdir(src_dir, dest_dir): + # move the subdir in src_dir to dest_dir return dest_dir/subdir + # and remove the src_dir + ''' + before: + src_dir--2023***1--... (bin file存在的路径) + + dest_dir--|--2023***2--... (原来可能存在的时间戳路径) + |--2023***3--... (原来可能存在的时间戳路径) + + after: + dest_dir--|--2023***2--... (原来可能存在的时间戳路径) + |--2023***3--... (原来可能存在的时间戳路径) + |--2023***1--... (bin file移动到新的目录下) + ''' + res_dest, res_subdir = None, None + subdirs = os.listdir(src_dir) + if len(subdirs) != 1: + logger.error( + "move_subdir failed: multiple or none directory under src dir %s. " "The reason might be dump failed.", + src_dir, + ) + else: + if os.path.exists(os.path.join(dest_dir, subdirs[0])): + logger.error("move_subdir failed: dest dir %s exists" % os.path.join(dest_dir, subdirs[0])) + else: + shutil.move(os.path.join(src_dir, subdirs[0]), os.path.join(dest_dir, subdirs[0])) + res_dest, res_subdir = dest_dir, subdirs[0] + return res_dest, res_subdir diff --git a/tools/infer_tool/ais_bench/infer/infer_process.py b/tools/infer_tool/ais_bench/infer/infer_process.py new file mode 100644 index 0000000000000000000000000000000000000000..3ae667657a7009be09165ad781310c682488527a --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/infer_process.py @@ -0,0 +1,753 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging +import math +import os +import sys +import time +import json +import shutil +import copy +import shlex +import re +import subprocess +import fcntl +from multiprocessing import Pool +from multiprocessing import Manager +import numpy as np + +from tqdm import tqdm + +from ais_bench.infer.interface import InferSession, MemorySummary +from ais_bench.infer.common.io_operations import (create_infileslist_from_inputs_list, + create_pipeline_fileslist_from_inputs_list, + create_intensors_from_infileslist, + get_narray_from_files_list, + get_tensor_from_files_list, + convert_real_files, + PURE_INFER_FAKE_FILE_ZERO, + PURE_INFER_FAKE_FILE_RANDOM, + PURE_INFER_FAKE_FILE, save_tensors_to_file, + get_pure_infer_data) +from ais_bench.infer.summary import summary +from ais_bench.infer.common.miscellaneous import (dymshape_range_run, get_acl_json_path, version_check, + get_batchsize, ACL_JSON_CMD_LIST) +from ais_bench.infer.common.utils import (get_file_content, get_file_datasize, + get_fileslist_from_dir, list_split, list_share, + save_data_to_files, create_fake_file_name, logger, + create_tmp_acl_json, move_subdir, convert_helper) +from ais_bench.infer.common.path_security_check import is_legal_args_path_string +from ais_bench.infer.args_adapter import AISBenchInferArgsAdapter +from ais_bench.infer.backends import BackendFactory +from ais_bench.infer.common.path_security_check import ms_open, MAX_SIZE_LIMITE_CONFIG_FILE + +PERMISSION_DIR = 0o750 +logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='[%(levelname)s] %(message)s') +logger = logging.getLogger(__name__) + + +def set_session_options(session, args): + # 增加校验 + aipp_batchsize = -1 + if args.dym_batch != 0: + session.set_dynamic_batchsize(args.dym_batch) + aipp_batchsize = session.get_max_dym_batchsize() + elif args.dym_hw is not None: + hwstr = args.dym_hw.split(",") + session.set_dynamic_hw((int)(hwstr[0]), (int)(hwstr[1])) + elif args.dym_dims is not None: + session.set_dynamic_dims(args.dym_dims) + elif args.dym_shape is not None: + session.set_dynamic_shape(args.dym_shape) + else: + session.set_staticbatch() + + if args.batchsize is None: + args.batchsize = get_batchsize(session, args) + logger.info(f"try get model batchsize:{args.batchsize}") + + if not args.auto_set_dymshape_mode and not args.auto_set_dymdims_mode: + if args.batchsize < 0 and not args.dym_batch and not args.dym_dims and not args.dym_shape: + raise RuntimeError('dynamic batch om model detected, but dymbatch, dymdims or dymshape not set!') + + if aipp_batchsize < 0: + aipp_batchsize = args.batchsize + + # 确认模型只有一个动态 aipp input + if args.dym_shape is not None or args.auto_set_dymshape_mode: + aipp_input_exist = 0 + else: + aipp_input_exist = session.get_dym_aipp_input_exist() + logger.debug(f"aipp_input_exist: {aipp_input_exist}") + if (args.aipp_config is not None) and (aipp_input_exist == 1): + session.load_aipp_config_file(args.aipp_config, aipp_batchsize) + session.check_dym_aipp_input_exist() + elif (args.aipp_config is None) and (aipp_input_exist == 1): + logger.error("can't find aipp config file for model with dym aipp input , please check it!") + raise RuntimeError('aipp model without aipp config!') + elif (aipp_input_exist > 1): + logger.error(f"don't support more than one dynamic aipp input in model, \ + amount of aipp input is {aipp_input_exist}") + raise RuntimeError('aipp model has more than 1 aipp input!') + elif (aipp_input_exist == -1): + raise RuntimeError('aclmdlGetAippType failed!') + + # 设置custom out tensors size + if args.output_size is not None: + customsizes = [int(n) for n in args.output_size.split(',')] + logger.debug(f"set customsize:{customsizes}") + session.set_custom_outsize(customsizes) + + +def init_inference_session(args, acl_json_path): + session = InferSession(args.device, args.model, acl_json_path, args.debug, args.loop) + + set_session_options(session, args) + logger.debug(f"session info:{session.session}") + return session + + +def set_dymshape_shape(session, inputs): + shape_list = [] + intensors_desc = session.get_inputs() + for i, input_ in enumerate(inputs): + str_shape = [str(shape) for shape in input_.shape] + shapes = ",".join(str_shape) + dyshape = f"{intensors_desc[i].name}:{shapes}" + shape_list.append(dyshape) + dyshapes = ';'.join(shape_list) + logger.debug(f"set dymshape shape:{dyshapes}") + session.set_dynamic_shape(dyshapes) + summary.add_batchsize(inputs[0].shape[0]) + + +def set_dymdims_shape(session, inputs): + shape_list = [] + intensors_desc = session.get_inputs() + for i, input_ in enumerate(inputs): + str_shape = [str(shape) for shape in input_.shape] + shapes = ",".join(str_shape) + dydim = f"{intensors_desc[i].name}:{shapes}" + shape_list.append(dydim) + dydims = ';'.join(shape_list) + logger.debug(f"set dymdims shape:{dydims}") + session.set_dynamic_dims(dydims) + summary.add_batchsize(inputs[0].shape[0]) + + +def warmup(session, args, intensors_desc, infiles): + # prepare input data + infeeds = [] + for j, files in enumerate(infiles): + if args.run_mode == "tensor": + tensor = get_tensor_from_files_list(files, session, intensors_desc[j].realsize, + args.pure_data_type, args.no_combine_tensor_mode) + infeeds.append(tensor) + else: + narray = get_narray_from_files_list(files, intensors_desc[j].realsize, + args.pure_data_type, args.no_combine_tensor_mode) + infeeds.append(narray) + session.set_loop_count(1) + # warmup + for _ in range(args.warmup_count): + outputs = run_inference(session, args, infeeds, out_array=True) + + session.set_loop_count(args.loop) + + # reset summary info + summary.reset() + session.reset_summaryinfo() + MemorySummary.reset() + logger.info(f"warm up {args.warmup_count} done") + + +def run_inference(session, args, inputs, out_array=False): + if args.auto_set_dymshape_mode: + set_dymshape_shape(session, inputs) + elif args.auto_set_dymdims_mode: + set_dymdims_shape(session, inputs) + outputs = session.run(inputs, out_array) + return outputs + + +def run_pipeline_inference(session, args, infileslist, output_prefix, extra_session): + out = output_prefix if output_prefix is not None else "" + pure_infer_mode = False + if args.input is None: + pure_infer_mode = True + session.run_pipeline(infileslist, + out, + args.auto_set_dymshape_mode, + args.auto_set_dymdims_mode, + args.outfmt, + pure_infer_mode, + [s.session for s in extra_session]) + + +# tensor to loop infer +def infer_loop_tensor_run(session, args, intensors_desc, infileslist, output_prefix): + for i, infiles in enumerate(tqdm(infileslist, file=sys.stdout, desc='Inference tensor Processing')): + intensors = [] + for j, files in enumerate(infiles): + tensor = get_tensor_from_files_list(files, session, intensors_desc[j].realsize, + args.pure_data_type, args.no_combine_tensor_mode) + intensors.append(tensor) + outputs = run_inference(session, args, intensors) + session.convert_tensors_to_host(outputs) + if output_prefix is not None: + save_tensors_to_file( + outputs, output_prefix, infiles, + args.outfmt, i, args.output_batchsize_axis + ) + + +# files to loop iner +def infer_loop_files_run(session, args, intensors_desc, infileslist, output_prefix): + for i, infiles in enumerate(tqdm(infileslist, file=sys.stdout, desc='Inference files Processing')): + intensors = [] + for j, files in enumerate(infiles): + real_files = convert_real_files(files) + tensor = session.create_tensor_from_fileslist(intensors_desc[j], real_files) + intensors.append(tensor) + outputs = run_inference(session, args, intensors) + session.convert_tensors_to_host(outputs) + if output_prefix is not None: + save_tensors_to_file( + outputs, output_prefix, infiles, + args.outfmt, i, args.output_batchsize_axis + ) + + +# First prepare the data, then execute the reference, and then write the file uniformly +def infer_fulltensors_run(session, args, intensors_desc, infileslist, output_prefix): + outtensors = [] + intensorslist = create_intensors_from_infileslist(infileslist, intensors_desc, session, + args.pure_data_type, args.no_combine_tensor_mode) + + for inputs in tqdm(intensorslist, file=sys.stdout, desc='Inference Processing full'): + outputs = run_inference(session, args, inputs) + outtensors.append(outputs) + + for i, outputs in enumerate(outtensors): + session.convert_tensors_to_host(outputs) + if output_prefix is not None: + save_tensors_to_file( + outputs, output_prefix, infileslist[i], + args.outfmt, i, args.output_batchsize_axis + ) + + +# loop numpy array to infer +def infer_loop_array_run(session, args, intensors_desc, infileslist, output_prefix): + for i, infiles in enumerate(tqdm(infileslist, file=sys.stdout, desc='Inference array Processing')): + innarrays = [] + for j, files in enumerate(infiles): + narray = get_narray_from_files_list(files, intensors_desc[j].realsize, args.pure_data_type) + innarrays.append(narray) + outputs = run_inference(session, args, innarrays) + session.convert_tensors_to_host(outputs) + if args.output is not None: + save_tensors_to_file( + outputs, output_prefix, infiles, + args.outfmt, i, args.output_batchsize_axis + ) + + +def infer_pipeline_run(session, args, infileslist, output_prefix, extra_session): + logger.info(f"run in pipeline mode with computing threadsnumber:{args.threads}") + run_pipeline_inference(session, args, infileslist, output_prefix, extra_session) + + +def get_file_name(file_path: str, suffix: str, res_file_path: list) -> list: + """获取路径下的指定文件类型后缀的文件 + Args: + file_path: 文件夹的路径 + suffix: 要提取的文件类型的后缀 + res_file_path: 保存返回结果的列表 + Returns: 文件路径 + """ + for file in os.listdir(file_path): + + if os.path.isdir(os.path.join(file_path, file)): + get_file_name(os.path.join(file_path, file), suffix, res_file_path) + else: + res_file_path.append(os.path.join(file_path, file)) + # endswith:表示以suffix结尾。可根据需要自行修改;如:startswith:表示以suffix开头,__contains__:包含suffix字符串 + if suffix == '' or suffix is None: + return res_file_path + else: + return list(filter(lambda x: x.endswith(suffix), res_file_path)) + + +def get_legal_json_content(acl_json_path): + cmd_dict = {} + with ms_open(acl_json_path, mode="r", max_size=MAX_SIZE_LIMITE_CONFIG_FILE) as f: + json_dict = json.load(f) + profile_dict = json_dict.get("profiler") + for option_cmd in ACL_JSON_CMD_LIST: + if profile_dict.get(option_cmd): + if option_cmd == "output" and not is_legal_args_path_string(profile_dict.get(option_cmd)): + raise Exception(f"output path in acl_json is illegal!") + cmd_dict.update({"--" + option_cmd.replace('_', '-'): profile_dict.get(option_cmd)}) + if (option_cmd == "sys_hardware_mem_freq"): + cmd_dict.update({"--sys-hardware-mem": "on"}) + if (option_cmd == "sys_interconnection_freq"): + cmd_dict.update({"--sys-interconnection-profiling": "on"}) + if (option_cmd == "dvpp_freq"): + cmd_dict.update({"--dvpp-profiling": "on"}) + return cmd_dict + + +def json_to_msprof_cmd(acl_json_path): + json_dict = get_legal_json_content(acl_json_path) + msprof_option_cmd = " ".join([f"{key}={value}" for key, value in json_dict.items()]) + return msprof_option_cmd + + +def regenerate_cmd(args:AISBenchInferArgsAdapter): + args_dict = args.get_all_args_dict() + cmd = sys.executable + " -m ais_bench" + for key, value in args_dict.items(): + if key == '--acl_json_path': + continue + if key == '--warmup_count': + cmd = cmd + " " + f"{key}={0}" + continue + if key == '--profiler': + cmd = cmd + " " + f"{key}={0}" + continue + if value: + cmd = cmd + " " + f"{key}={value}" + return cmd + + +def msprof_run_profiling(args, msprof_bin): + if args.acl_json_path is not None: + # acl.json to msprof cmd + args.profiler_rename = False + cmd = regenerate_cmd(args) + msprof_cmd = f"{msprof_bin} --application=\"{cmd}\" " + json_to_msprof_cmd(args.acl_json_path) + else: + # default msprof cmd + cmd = regenerate_cmd(args) + msprof_cmd = f"{msprof_bin} --output={args.output}/profiler --application=\"{cmd}\" --model-execution=on \ + --sys-hardware-mem=on --sys-cpu-profiling=off --sys-profiling=off --sys-pid-profiling=off \ + --dvpp-profiling=on --runtime-api=on --task-time=on --aicpu=on" \ + + ret = -1 + msprof_cmd_list = shlex.split(msprof_cmd) + logger.info(f"msprof cmd:{msprof_cmd} begin run") + if (args.profiler_rename): + p = subprocess.Popen(msprof_cmd_list, stdout=subprocess.PIPE, shell=False, bufsize=0) + flags = fcntl.fcntl(p.stdout, fcntl.F_GETFL) + fcntl.fcntl(p.stdout, fcntl.F_SETFL, flags | os.O_NONBLOCK) + + get_path_flag = True + sub_str = "" + for line in iter(p.stdout.read, b''): + if not line: + continue + line = line.decode() + if (get_path_flag and line.find("PROF_") != -1): + get_path_flag = False + start_index = line.find("PROF_") + sub_str = line[start_index:(start_index + 46)] # PROF_XXXX的目录长度为46 + print(f'{line}', flush=True, end="") + p.stdout.close() + p.wait() + + output_prefix = os.path.join(args.output, "profiler") + output_prefix = os.path.join(output_prefix, sub_str) + hash_str = sub_str.rsplit('_')[-1] + file_name = get_file_name(output_prefix, ".csv", []) + file_name_json = get_file_name(output_prefix, ".json", []) + + model_name = os.path.basename(args.model).split(".")[0] + for file in file_name: + real_file = os.path.splitext(file)[0] + os.rename(file, real_file + "_" + model_name + "_" + hash_str + ".csv") + for file in file_name_json: + real_file = os.path.splitext(file)[0] + os.rename(file, real_file + "_" + model_name + "_" + hash_str + ".json") + ret = 0 + else: + ret = subprocess.call(msprof_cmd_list, shell=False) + logger.info(f"msprof cmd:{msprof_cmd} end run ret:{ret}") + return ret + + +def get_energy_consumption(npu_id): + cmd = f"npu-smi info -t power -i {npu_id}" + get_npu_id = subprocess.run(cmd.split(), shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE) + npu_id = get_npu_id.stdout.decode('gb2312') + power = [] + npu_id = npu_id.split("\n") + for key in npu_id: + if key.find("Power Dissipation(W)", 0, len(key)) != -1: + power = key[34:len(key)] + break + + return power + + +def convert(tmp_acl_json_path, real_dump_path, tmp_dump_path): + if real_dump_path is not None and tmp_dump_path is not None: + output_dir, timestamp = move_subdir(tmp_dump_path, real_dump_path) + convert_helper(output_dir, timestamp) + if tmp_dump_path is not None: + shutil.rmtree(tmp_dump_path) + if tmp_acl_json_path is not None: + os.remove(tmp_acl_json_path) + + +def main(args, index=0, msgq=None, device_list=None): + # if msgq is not None,as subproces run + if msgq is not None: + logger.info(f"subprocess_{index} main run") + + if args.debug: + logger.setLevel(logging.DEBUG) + + acl_json_path = get_acl_json_path(args) + tmp_acl_json_path = None + if args.dump_npy and acl_json_path is not None: + tmp_acl_json_path, real_dump_path, tmp_dump_path = create_tmp_acl_json(acl_json_path) + + session = init_inference_session(args, tmp_acl_json_path if tmp_acl_json_path is not None else acl_json_path) + # if pipeline is set and threads number is > 1, create a session pool for extra computing + extra_session = [] + if args.pipeline: + extra_session = [init_inference_session(args, tmp_acl_json_path if tmp_acl_json_path is not None\ + else acl_json_path) for _ in range(args.threads - 1)] + + intensors_desc = session.get_inputs() + if device_list is not None and len(device_list) > 1: + if args.output is not None: + if args.output_dirname is None: + timestr = time.strftime("%Y_%m_%d-%H_%M_%S") + output_prefix = os.path.join(args.output, timestr) + output_prefix = os.path.join(output_prefix, "device" + str(device_list[index]) + "_" + str(index)) + else: + output_prefix = os.path.join(args.output, args.output_dirname) + output_prefix = os.path.join(output_prefix, "device" + str(device_list[index]) + "_" + str(index)) + if not os.path.exists(output_prefix): + os.makedirs(output_prefix, PERMISSION_DIR) + os.chmod(args.output, PERMISSION_DIR) + logger.info(f"output path:{output_prefix}") + else: + output_prefix = None + else: + if args.output is not None: + if args.output_dirname is None: + timestr = time.strftime("%Y_%m_%d-%H_%M_%S") + output_prefix = os.path.join(args.output, timestr) + else: + output_prefix = os.path.join(args.output, args.output_dirname) + if not os.path.exists(output_prefix): + os.makedirs(output_prefix, PERMISSION_DIR) + os.chmod(args.output, PERMISSION_DIR) + logger.info(f"output path:{output_prefix}") + else: + output_prefix = None + + inputs_list = [] if args.input is None else args.input.split(',') + + # create infiles list accord inputs list + if len(inputs_list) == 0: + # Pure reference scenario. Create input zero data + if not args.pipeline: + infileslist = [[[PURE_INFER_FAKE_FILE] for _ in intensors_desc]] + else: + infileslist = [[]] + pure_file = PURE_INFER_FAKE_FILE_ZERO if args.pure_data_type == "zero" else PURE_INFER_FAKE_FILE_RANDOM + for _ in intensors_desc: + infileslist[0].append(pure_file) + else: + if not args.pipeline: + infileslist = create_infileslist_from_inputs_list(inputs_list, intensors_desc, args.no_combine_tensor_mode) + else: + infileslist = create_pipeline_fileslist_from_inputs_list(inputs_list, intensors_desc) + if not args.pipeline: + warmup(session, args, intensors_desc, infileslist[0]) + else: + # prepare for pipeline case + infiles = [] + for file in infileslist[0]: + infiles.append([file]) + warmup(session, args, intensors_desc, infiles) + for sess in extra_session: + warmup(sess, args, intensors_desc, infiles) + + if args.pipeline and (args.auto_set_dymshape_mode or args.auto_set_dymdims_mode): + for file_list in infileslist: + input_first = np.load(file_list[0]) + summary.add_batchsize(input_first.shape[0]) + + if msgq is not None: + # wait subprocess init ready, if time eplapsed, force ready run + logger.info(f"subprocess_{index} qsize:{msgq.qsize()} now waiting") + msgq.put(index) + time_sec = 0 + while True: + if msgq.qsize() >= args.subprocess_count: + break + time_sec = time_sec + 1 + if time_sec > 10: + logger.warning(f"subprocess_{index} qsize:{msgq.qsize()} time:{time_sec} s elapsed") + break + time.sleep(1) + logger.info(f"subprocess_{index} qsize:{msgq.qsize()} ready to infer run") + + start_time = time.time() + if args.energy_consumption: + start_energy_consumption = get_energy_consumption(args.npu_id) + if args.pipeline: + infer_pipeline_run(session, args, infileslist, output_prefix, extra_session) + else: + run_mode_switch = { + "array": infer_loop_array_run, + "files": infer_loop_files_run, + "full": infer_fulltensors_run, + "tensor": infer_loop_tensor_run + } + if run_mode_switch.get(args.run_mode) is not None: + run_mode_switch.get(args.run_mode)(session, args, intensors_desc, infileslist, output_prefix) + else: + raise RuntimeError(f'wrong run_mode:{args.run_mode}') + if args.energy_consumption: + end_energy_consumption = get_energy_consumption(args.npu_id) + end_time = time.time() + + multi_threads_mode = args.threads > 1 and args.pipeline + summary.add_args(sys.argv) + s = session.summary() + if multi_threads_mode: + summary.npu_compute_time_interval_list = s.exec_time_list + else: + summary.npu_compute_time_list = [end_time - start_time for start_time, end_time in s.exec_time_list] + summary.h2d_latency_list = MemorySummary.get_h2d_time_list() + summary.d2h_latency_list = MemorySummary.get_d2h_time_list() + summary.report(args.batchsize, output_prefix, args.display_all_summary, multi_threads_mode) + try: + if args.energy_consumption: + energy_consumption = ((float(end_energy_consumption) + float(start_energy_consumption)) / 2.0) \ + * (end_time - start_time) + logger.info(f"NPU ID:{args.npu_id} energy consumption(J):{energy_consumption}") + except AttributeError as err: + logger.error(f"Attribute Access Error: {err}") + raise RuntimeError("Error accessing an attribute, please verify if the NPU ID is correct. ") from err + except Exception as err: + logger.error(f"Unexpected Error: {err}") + raise RuntimeError( + "Energy consumption append an unexpected error occurred, please check the input parameters.") from err + + if msgq is not None: + # put result to msgq + msgq.put([index, summary.infodict['throughput'], start_time, end_time]) + + session.free_resource() + for sess in extra_session: + sess.free_resource() + + InferSession.finalize() + + if args.dump_npy and acl_json_path is not None: + convert(tmp_acl_json_path, real_dump_path, tmp_dump_path) + + +def print_subproces_run_error(value): + logger.error(f"subprocess run failed error_callback:{value}") + + +def seg_input_data_for_multi_process(args, inputs, jobs): + inputs_list = [] if inputs is None else inputs.split(',') + if inputs_list is None: + return inputs_list + + fileslist = [] + if os.path.isfile(inputs_list[0]): + fileslist = inputs_list + elif os.path.isdir(inputs_list[0]): + for dir_path in inputs_list: + fileslist.extend(get_fileslist_from_dir(dir_path)) + else: + logger.error(f'error {inputs_list[0]} not file or dir') + raise RuntimeError() + + args.device = 0 + acl_json_path = get_acl_json_path(args) + session = init_inference_session(args, acl_json_path) + intensors_desc = session.get_inputs() + try: + chunks_elements = math.ceil(len(fileslist) / len(intensors_desc)) + except ZeroDivisionError as err: + logger.error("ZeroDivisionError: intensors_desc is empty") + raise RuntimeError("error zero division") from err + chunks = list(list_split(fileslist, chunks_elements, None)) + fileslist = [[] for _ in range(jobs)] + for _, chunk in enumerate(chunks): + try: + splits_elements = int(len(chunk) / jobs) + except ZeroDivisionError as err: + logger.error("ZeroDivisionError: intensors_desc is empty") + raise RuntimeError("error zero division") from err + splits_left = len(chunk) % jobs + splits = list(list_share(chunk, jobs, splits_elements, splits_left)) + for j, split in enumerate(splits): + fileslist[j].extend(split) + res = [] + for files in fileslist: + res.append(','.join(list(filter(None, files)))) + return res + + +def multidevice_run(args): + logger.info(f"multidevice:{args.device} run begin") + device_list = args.device + npu_id_list = args.npu_id + p = Pool(len(device_list)) + msgq = Manager().Queue() + args.subprocess_count = len(device_list) + splits = None + if (args.input is not None and args.divide_input): + jobs = args.subprocess_count + splits = seg_input_data_for_multi_process(args, args.input, jobs) + + for i, device in enumerate(device_list): + cur_args = copy.deepcopy(args) + cur_args.device = int(device) + if args.energy_consumption: + cur_args.npu_id = int(npu_id_list[i]) + if args.divide_input: + cur_args.input = None if splits is None else list(splits)[i] + p.apply_async(main, args=(cur_args, i, msgq, device_list), error_callback=print_subproces_run_error) + + p.close() + p.join() + result = 0 if 2 * len(device_list) == msgq.qsize() else 1 + logger.info(f"multidevice run end qsize:{msgq.qsize()} result:{result}") + tlist = [] + while msgq.qsize() != 0: + ret = msgq.get() + if type(ret) == list: + logger.info(f"i:{ret[0]} device_{device_list[ret[0]]} throughput:{ret[1]} \ + start_time:{ret[2]} end_time:{ret[3]}") + tlist.append(ret[1]) + logger.info(f'summary throughput:{sum(tlist)}') + return result + + +def args_rules(args): + if args.profiler and args.dump: + logger.error("parameter --profiler cannot be true at the same time as parameter --dump, please check them!\n") + raise RuntimeError('error bad parameters --profiler and --dump') + + if (args.profiler or args.dump) and (args.output is None): + logger.error("when dump or profiler, miss output path, please check them!") + raise RuntimeError('miss output parameter!') + + if not args.auto_set_dymshape_mode and not args.auto_set_dymdims_mode: + args.no_combine_tensor_mode = False + else: + args.no_combine_tensor_mode = True + + if args.profiler and args.warmup_count != 0 and args.input is not None: + logger.info("profiler mode with input change warmup_count to 0") + args.warmup_count = 0 + + if args.output is None and args.output_dirname is not None: + logger.error( + "parameter --output_dirname cann't be used alone. Please use it together with the parameter --output!\n") + raise RuntimeError('error bad parameters --output_dirname') + + if args.threads > 1 and not args.pipeline: + logger.info("need to set --pipeline when setting threads number to be more than one.") + args.threads = 1 + + return args + + +def acl_json_base_check(args): + if args.acl_json_path is None: + return args + json_path = args.acl_json_path + try: + with ms_open(json_path, mode="r", max_size=MAX_SIZE_LIMITE_CONFIG_FILE) as f: + json_dict = json.load(f) + except Exception as err: + logger.error(f"can't read acl_json_path:{json_path}") + raise Exception from err + if json_dict.get("profiler") is not None and json_dict.get("profiler").get("switch") == "on": + args.profiler = True + if json_dict.get("dump") is not None: + args.profiler = False + return args + + +def config_check(config_path): + if not config_path: + return + max_config_size = 12800 + if os.path.splitext(config_path)[1] != ".config": + logger.error(f"aipp_config:{config_path} is not a .config file") + raise TypeError(f"aipp_config:{config_path} is not a .config file") + config_size = os.path.getsize(config_path) + if config_size > max_config_size: + logger.error(f"aipp_config_size:{config_size} byte out of max limit {max_config_size} byte") + raise MemoryError(f"aipp_config_size:{config_size} byte out of max limit") + return + + +def backend_run(args): + backend_class = BackendFactory.create_backend(args.backend) + backend = backend_class(args) + backend.load(args.model) + backend.run() + perf = backend.get_perf() + logger.info(f"perf info:{perf}") + + +def infer_process(args:AISBenchInferArgsAdapter): + args = args_rules(args) + version_check(args) + args = acl_json_base_check(args) + + if args.perf: + backend_run(args) + return 0 + + if args.profiler: + # try use msprof to run + msprof_bin = shutil.which('msprof') + if msprof_bin is None: + logger.info("find no msprof continue use acl.json mode, result won't be parsed as csv") + elif os.getenv('AIT_NO_MSPROF_MODE') == '1': + logger.info("find AIT_NO_MSPROF_MODE set, continue use acl.json mode, result won't be parsed as csv") + else: + ret = msprof_run_profiling(args, msprof_bin) + return ret + + if args.dym_shape_range is not None and args.dym_shape is None: + # dymshape range run,according range to run each shape infer get best shape + dymshape_range_run(args) + return 0 + + if type(args.device) == list: + # args has multiple device, run single process for each device + ret = multidevice_run(args) + return ret + + main(args) + return 0 diff --git a/tools/infer_tool/ais_bench/infer/interface.py b/tools/infer_tool/ais_bench/infer/interface.py new file mode 100644 index 0000000000000000000000000000000000000000..7719a8e6b3b7138bb34154497b8f7bcbb4bf9774 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/interface.py @@ -0,0 +1,889 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging +import time +import sys +from configparser import ConfigParser +from multiprocessing import Pool +from multiprocessing import Manager +import numpy as np +import aclruntime + + +SRC_IMAGE_SIZE_W_MIN = 2 +SRC_IMAGE_SIZE_W_MAX = 4096 +SRC_IMAGE_SIZE_H_MIN = 1 +SRC_IMAGE_SIZE_H_MAX = 4096 +RBUV_SWAP_SWITCH_OFF = 0 +RBUV_SWAP_SWITCH_ON = 1 +AX_SWAP_SWITCH_OFF = 0 +AX_SWAP_SWITCH_ON = 1 +CSC_SWITCH_OFF = 0 +CSC_SWITCH_ON = 0 +CSC_MATRIX_MIN = -32677 +CSC_MATRIX_MAX = 32676 +CROP_SWITCH_OFF = 0 +CROP_SWITCH_ON = 1 +LOAD_START_POS_W_MIN = 0 +LOAD_START_POS_W_MAX = 4095 +LOAD_START_POS_H_MIN = 0 +LOAD_START_POS_H_MAX = 4095 +CROP_POS_W_MIN = 1 +CROP_POS_W_MAX = 4096 +CROP_POS_H_MIN = 1 +CROP_POS_H_MAX = 4096 +PADDING_SWITCH_OFF = 0 +PADDING_SWITCH_ON = 1 +PADDING_SIZE_MIN = 0 +PADDING_SIZE_MAX = 32 +PIXEL_MEAN_CHN_MIN = 0 +PIXEL_MEAN_CHN_MAX = 255 +PIXEL_MIN_CHN_MIN = 0 +PIXEL_MIN_CHN_MAX = 255 +PIXEL_VAR_RECI_CHN_MIN = -65504 +PIXEL_VAR_RECI_CHN_MAX = 65504 + +TORCH_TENSOR_LIST = [ + 'torch.FloatTensor', 'torch.DoubleTensor', 'torch.HalfTensor', 'torch.BFloat16Tensor', + 'torch.ByteTensor', 'torch.CharTensor', 'torch.ShortTensor', 'torch.LongTensor', + 'torch.BoolTensor', 'torch.IntTensor' +] +NP_TYPE_LIST = [ + np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, + np.uint32, np.float16, np.float32, np.float64 +] + +logger = logging.getLogger(__name__) + + +class InferSession: + def __init__(self, device_id: int, model_path: str, acl_json_path: str = None, + debug: bool = False, loop: int = 1): + """ + init InferSession + + Args: + device_id: device id for npu device + model_path: om model path to load + acl_json_path: set acl_json_path to enable profiling or dump function + debug: enable debug log. Default: False + loop: loop count for one inference. Default: 1 + """ + self.device_id = device_id + self.model_path = model_path + self.loop = loop + self.options = aclruntime.session_options() + self.acl_json_path = acl_json_path + self.debug = debug + if acl_json_path is not None: + self.options.acl_json_path = self.acl_json_path + self.options.log_level = 1 if self.debug else 2 + self.options.loop = self.loop + self.session = aclruntime.InferenceSession(self.model_path, self.device_id, self.options) + self.outputs_names = [meta.name for meta in self.session.get_outputs()] + self.intensors_desc = self.session.get_inputs() + self.outtensors_desc = self.session.get_outputs() + self.infer_mode_switch = { + "static": self._static_prepare, + "dymbatch": self._dymbatch_prepare, + "dymhw": self._dymhw_prepare, + "dymdims": self._dymdims_prepare, + "dymshape": self._dymshape_prepare + } + + @staticmethod + def convert_tensors_to_host(tensors): + for tensor in tensors: + tensor.to_host() + + @staticmethod + def convert_tensors_to_arrays(tensors): + arrays = [] + for tensor in tensors: + # convert acltensor to numpy array + arrays.append(np.array(tensor)) + return arrays + + @staticmethod + def finalize(): + if hasattr(aclruntime.InferenceSession, 'finalize'): + aclruntime.InferenceSession.finalize() + + def get_inputs(self): + """ + get inputs info of model + """ + self.intensors_desc = self.session.get_inputs() + return self.intensors_desc + + def get_outputs(self): + """ + get outputs info of model + """ + self.outtensors_desc = self.session.get_outputs() + return self.outtensors_desc + + def set_loop_count(self, loop): + options = self.session.options() + options.loop = loop + + # 默认设置为静态batch + def set_staticbatch(self): + self.session.set_staticbatch() + + def set_dynamic_batchsize(self, dym_batch: str): + self.session.set_dynamic_batchsize(dym_batch) + + def set_dynamic_hw(self, w: int, h: int): + self.session.set_dynamic_hw(w, h) + + def get_max_dym_batchsize(self): + return self.session.get_max_dym_batchsize() + + def set_dynamic_dims(self, dym_dims: str): + self.session.set_dynamic_dims(dym_dims) + + def set_dynamic_shape(self, dym_shape: str): + self.session.set_dynamic_shape(dym_shape) + + def set_custom_outsize(self, custom_sizes): + self.session.set_custom_outsize(custom_sizes) + + def create_tensor_from_fileslist(self, desc, files): + return self.session.create_tensor_from_fileslist(desc, files) + + def create_tensor_from_arrays_to_device(self, arrays): + tensor = aclruntime.Tensor(arrays) + tensor.to_device(self.device_id) + return tensor + + def get_dym_aipp_input_exist(self): + return self.session.get_dym_aipp_input_exist() + + def check_dym_aipp_input_exist(self): + self.session.check_dym_aipp_input_exist() + + def load_aipp_config_file(self, config_file, batchsize): + cfg = ConfigParser() + cfg.read(config_file, 'UTF-8') + session_list = cfg.sections() + #多个aipp输入不支持 + if (session_list.count('aipp_op') != 1): + logger.error("nums of section aipp_op in .config file is not supported, please check it!") + raise RuntimeError('wrong aipp config file content!') + option_list = cfg.options('aipp_op') + if (option_list.count('input_format') == 1): + self.aipp_set_input_format(cfg) + else: + logger.error("can not find input_format in config file, please check it!") + raise RuntimeError('wrong aipp config file content!') + + if (option_list.count('src_image_size_w') == 1 and option_list.count('src_image_size_h') == 1): + self.aipp_set_src_image_size(cfg) + else: + logger.error("can not find src_image_size in config file, please check it!") + raise RuntimeError('wrong aipp config file content!') + self.session.aipp_set_max_batch_size(batchsize) + self.aipp_set_rbuv_swap_switch(cfg, option_list) + self.aipp_set_ax_swap_switch(cfg, option_list) + self.aipp_set_csc_params(cfg, option_list) + self.aipp_set_crop_params(cfg, option_list) + self.aipp_set_padding_params(cfg, option_list) + self.aipp_set_dtc_pixel_mean(cfg, option_list) + self.aipp_set_dtc_pixel_min(cfg, option_list) + self.aipp_set_pixel_var_reci(cfg, option_list) + + ret = self.session.set_dym_aipp_info_set() + return ret + + def aipp_set_input_format(self, cfg): + input_format = cfg.get('aipp_op', 'input_format') + legal_format = ["YUV420SP_U8", "XRGB8888_U8", "RGB888_U8", "YUV400_U8"] + if (legal_format.count(input_format) == 1): + self.session.aipp_set_input_format(input_format) + else: + logger.error("input_format in config file is illegal, please check it!") + raise RuntimeError('wrong aipp config file content!') + + def aipp_set_src_image_size(self, cfg): + src_image_size = list() + tmp_size_w = cfg.getint('aipp_op', 'src_image_size_w') + tmp_size_h = cfg.getint('aipp_op', 'src_image_size_h') + if (SRC_IMAGE_SIZE_W_MIN <= tmp_size_w <= SRC_IMAGE_SIZE_W_MAX): + src_image_size.append(tmp_size_w) + else: + logger.error("src_image_size_w in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + if (SRC_IMAGE_SIZE_H_MIN <= tmp_size_h <= SRC_IMAGE_SIZE_H_MAX): + src_image_size.append(tmp_size_h) + else: + logger.error("src_image_size_h in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_src_image_size(src_image_size) + + def aipp_set_rbuv_swap_switch(self, cfg, option_list): + if (option_list.count('rbuv_swap_switch') == 0): + self.session.aipp_set_rbuv_swap_switch(RBUV_SWAP_SWITCH_OFF) + return + tmp_rs_switch = cfg.getint('aipp_op', 'rbuv_swap_switch') + if (tmp_rs_switch == RBUV_SWAP_SWITCH_OFF or tmp_rs_switch == RBUV_SWAP_SWITCH_ON): + self.session.aipp_set_rbuv_swap_switch(tmp_rs_switch) + else: + logger.error("rbuv_swap_switch in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + def aipp_set_ax_swap_switch(self, cfg, option_list): + if (option_list.count('ax_swap_switch') == 0): + self.session.aipp_set_ax_swap_switch(AX_SWAP_SWITCH_OFF) + return + tmp_as_switch = cfg.getint('aipp_op', 'ax_swap_switch') + if (tmp_as_switch == AX_SWAP_SWITCH_OFF or tmp_as_switch == AX_SWAP_SWITCH_ON): + self.session.aipp_set_ax_swap_switch(tmp_as_switch) + else: + logger.error("ax_swap_switch in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + def aipp_set_csc_params(self, cfg, option_list): + if (option_list.count('csc_switch') == 0): + tmp_csc_switch = CSC_SWITCH_OFF + else: + tmp_csc_switch = cfg.getint('aipp_op', 'csc_switch') + + if (tmp_csc_switch == CSC_SWITCH_OFF): + tmp_csc_params = [0] * 16 + elif (tmp_csc_switch == CSC_SWITCH_ON): + tmp_csc_params = list() + tmp_csc_params.append(tmp_csc_switch) + options = [ + 'matrix_r0c0', 'matrix_r0c1', 'matrix_r0c2', 'matrix_r1c0', 'matrix_r1c1', 'matrix_r1c2', + 'matrix_r2c0', 'matrix_r2c1', 'matrix_r2c2', 'output_bias_0', 'output_bias_1', 'output_bias_2', + 'input_bias_0', 'input_bias_1', 'input_bias_2' + ] + for option in options: + tmp_csc_params.append(0 if option_list.count(option) == 0 else cfg.getint('aipp_op', option)) + + range_ok = True + for i in range(1, 9): + range_ok = range_ok and (CSC_MATRIX_MIN <= tmp_csc_params[i] <= CSC_MATRIX_MAX) + for i in range(10, 15): + range_ok = range_ok and (0 <= tmp_csc_params[i] <= 255) + if (range_ok is False): + logger.error("csc_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + else: + logger.error("csc_switch in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_csc_params(tmp_csc_params) + + def aipp_set_crop_params(self, cfg, option_list): + if (option_list.count('crop') == 0): + tmp_crop_switch = CROP_SWITCH_OFF + else: + tmp_crop_switch = cfg.getint('aipp_op', 'crop') + + if (tmp_crop_switch == CROP_SWITCH_OFF): + tmp_crop_params = [0, 0, 0, 416, 416] + elif (tmp_crop_switch == CROP_SWITCH_ON): + tmp_crop_params = list() + tmp_crop_params.append(tmp_crop_switch) + tmp_crop_params.append( + 0 if option_list.count('load_start_pos_w') == 0 else cfg.getint('aipp_op', 'load_start_pos_w') + ) + tmp_crop_params.append( + 0 if option_list.count('load_start_pos_h') == 0 else cfg.getint('aipp_op', 'load_start_pos_h') + ) + tmp_crop_params.append( + 0 if option_list.count('crop_size_w') == 0 else cfg.getint('aipp_op', 'crop_size_w') + ) + tmp_crop_params.append( + 0 if option_list.count('crop_size_h') == 0 else cfg.getint('aipp_op', 'crop_size_h') + ) + + range_ok = True + range_ok = range_ok and (LOAD_START_POS_W_MIN <= tmp_crop_params[1] <= LOAD_START_POS_W_MAX) + range_ok = range_ok and (LOAD_START_POS_H_MIN <= tmp_crop_params[2] <= LOAD_START_POS_H_MAX) + range_ok = range_ok and (CROP_POS_W_MIN <= tmp_crop_params[3] <= CROP_POS_W_MAX) + range_ok = range_ok and (CROP_POS_H_MIN <= tmp_crop_params[4] <= CROP_POS_H_MAX) + if (range_ok is False): + logger.error("crop_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + else: + logger.error("crop_switch(crop) in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_crop_params(tmp_crop_params) + + def aipp_set_padding_params(self, cfg, option_list): + if (option_list.count('padding') == 0): + tmp_padding_switch = PADDING_SWITCH_OFF + else: + tmp_padding_switch = cfg.getint('aipp_op', 'padding') + + if (tmp_padding_switch == PADDING_SWITCH_OFF): + tmp_padding_params = [0] * 5 + elif (tmp_padding_switch == PADDING_SWITCH_ON): + tmp_padding_params = list() + tmp_padding_params.append(tmp_padding_switch) + tmp_padding_params.append( + 0 if option_list.count('padding_size_top') == 0 else cfg.getint('aipp_op', 'padding_size_top') + ) + tmp_padding_params.append( + 0 if option_list.count('padding_size_bottom') == 0 else cfg.getint('aipp_op', 'padding_size_bottom') + ) + tmp_padding_params.append( + 0 if option_list.count('padding_size_left') == 0 else cfg.getint('aipp_op', 'padding_size_left') + ) + tmp_padding_params.append( + 0 if option_list.count('padding_size_right') == 0 else cfg.getint('aipp_op', 'padding_size_right') + ) + + range_ok = True + for i in range(1, 5): + range_ok = range_ok and (PADDING_SIZE_MIN <= tmp_padding_params[i] <= PADDING_SIZE_MAX) + if (range_ok is False): + logger.error("padding_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + else: + logger.error("padding_switch in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_padding_params(tmp_padding_params) + + def aipp_set_dtc_pixel_mean(self, cfg, option_list): + tmp_mean_params = list() + tmp_mean_params.append( + 0 if option_list.count('mean_chn_0') == 0 else cfg.getint('aipp_op', 'mean_chn_0') + ) + tmp_mean_params.append( + 0 if option_list.count('mean_chn_1') == 0 else cfg.getint('aipp_op', 'mean_chn_1') + ) + tmp_mean_params.append( + 0 if option_list.count('mean_chn_2') == 0 else cfg.getint('aipp_op', 'mean_chn_2') + ) + tmp_mean_params.append( + 0 if option_list.count('mean_chn_3') == 0 else cfg.getint('aipp_op', 'mean_chn_3') + ) + + range_ok = True + for i in range(0, 4): + range_ok = range_ok and (PIXEL_MEAN_CHN_MIN <= tmp_mean_params[i] <= PIXEL_MEAN_CHN_MAX) + if (range_ok is False): + logger.error("mean_chn_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_dtc_pixel_mean(tmp_mean_params) + + def aipp_set_dtc_pixel_min(self, cfg, option_list): + tmp_min_params = list() + tmp_min_params.append( + 0 if option_list.count('min_chn_0') == 0 else cfg.getfloat('aipp_op', 'min_chn_0') + ) + tmp_min_params.append( + 0 if option_list.count('min_chn_1') == 0 else cfg.getfloat('aipp_op', 'min_chn_1') + ) + tmp_min_params.append( + 0 if option_list.count('min_chn_2') == 0 else cfg.getfloat('aipp_op', 'min_chn_2') + ) + tmp_min_params.append( + 0 if option_list.count('min_chn_3') == 0 else cfg.getfloat('aipp_op', 'min_chn_3') + ) + + range_ok = True + for i in range(0, 4): + range_ok = range_ok and (PIXEL_MIN_CHN_MIN <= tmp_min_params[i] <= PIXEL_MIN_CHN_MAX) + if (range_ok is False): + logger.error("min_chn_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_dtc_pixel_min(tmp_min_params) + + def aipp_set_pixel_var_reci(self, cfg, option_list): + tmp_reci_params = list() + tmp_reci_params.append( + 0 if option_list.count('var_reci_chn_0') == 0 else cfg.getfloat('aipp_op', 'var_reci_chn_0') + ) + tmp_reci_params.append( + 0 if option_list.count('var_reci_chn_1') == 0 else cfg.getfloat('aipp_op', 'var_reci_chn_1') + ) + tmp_reci_params.append( + 0 if option_list.count('var_reci_chn_2') == 0 else cfg.getfloat('aipp_op', 'var_reci_chn_2') + ) + tmp_reci_params.append( + 0 if option_list.count('var_reci_chn_3') == 0 else cfg.getfloat('aipp_op', 'var_reci_chn_3') + ) + + range_ok = True + for i in range(0, 4): + range_ok = range_ok and (PIXEL_VAR_RECI_CHN_MIN <= tmp_reci_params[i] <= PIXEL_VAR_RECI_CHN_MAX) + if (range_ok is False): + logger.error("var_reci_chn_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_pixel_var_reci(tmp_reci_params) + + def run(self, feeds, out_array=False): + if len(feeds) > 0 and isinstance(feeds[0], np.ndarray): + # if feeds is ndarray list, convert to baseTensor + inputs = [] + for array in feeds: + basetensor = aclruntime.BaseTensor(array.__array_interface__['data'][0], array.nbytes) + inputs.append(basetensor) + else: + inputs = feeds + outputs = self.session.run(self.outputs_names, inputs) + if out_array: + # convert to host tensor + self.convert_tensors_to_host(outputs) + # convert tensor to narray + return self.convert_tensors_to_arrays(outputs) + else: + return outputs + + def run_pipeline(self, infilelist, output, auto_shape=False, + auto_dims=False, outfmt="BIN", pure_infer_mode=False, extra_session=None): + infer_options = aclruntime.infer_options() + infer_options.output_dir = output + infer_options.auto_dym_shape = auto_shape + infer_options.auto_dym_dims = auto_dims + infer_options.out_format = outfmt + infer_options.pure_infer_mode = pure_infer_mode + extra_session = [] if extra_session is None else extra_session + self.session.run_pipeline(infilelist, infer_options, extra_session) + + def reset_summaryinfo(self): + self.session.reset_sumaryinfo() + + def infer(self, feeds, mode='static', custom_sizes=100000, out_array=True): + ''' + Parameters: + feeds: input data + mode: static dymdims dymshape... + ''' + inputs = [] + shapes = [] + for feed in feeds: + if type(feed) is np.ndarray: + infer_input = feed + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shapes.append(infer_input.shape) + elif type(feed) in NP_TYPE_LIST: + infer_input = np.array(feed) + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shapes.append([feed.size]) + elif type(feed) is aclruntime.Tensor: + infer_input = feed + shapes.append(infer_input.shape) + elif hasattr(feed, 'type') and feed.type() in TORCH_TENSOR_LIST: + infer_input = feed.numpy() + if not feed.is_contiguous(): + infer_input = np.ascontiguousarray(infer_input) + shapes.append(infer_input.shape) + else: + raise RuntimeError('type:{} invalid'.format(type(feed))) + inputs.append(infer_input) + + if self.infer_mode_switch.get(mode) is not None: + self.infer_mode_switch.get(mode)(shapes, custom_sizes) + else: + raise RuntimeError('wrong infer_mode:{}, only support \"static\",\"dymbatch\",\"dymhw\", \ + \"dymdims\",\"dymshape\"'.format(mode)) + + return self.run(inputs, out_array) + + def free_resource(self): + if hasattr(self.session, "free_resource"): + self.session.free_resource() + + def infer_pipeline(self, feeds_list, mode='static', custom_sizes=100000): + ''' + Parameters: + feeds_list: input data list + mode: static dymdims dymshape... + ''' + inputs_list = [] + shapes_list = [] + for feeds in feeds_list: + inputs = [] + shapes = [] + for feed in feeds: + if type(feed) is np.ndarray: + infer_input = feed + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shape = feed.shape + elif type(feed) in NP_TYPE_LIST: + infer_input = np.array(feed) + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shape = [feed.size] + elif type(feed) is aclruntime.Tensor: + infer_input = np.array(feed) + shape = infer_input.shape + elif hasattr(feed, 'type') and feed.type() in TORCH_TENSOR_LIST: + infer_input = feed.numpy() + infer_input = np.ascontiguousarray(infer_input) if not feed.is_contiguous() else infer_input + shape = infer_input.shape + else: + raise RuntimeError('type:{} invalid'.format(type(feed))) + basetensor = aclruntime.BaseTensor(infer_input.__array_interface__['data'][0], infer_input.nbytes) + inputs.append(basetensor) + shapes.append(shape) + inputs_list.append(inputs) + shapes_list.append(shapes) + if self.infer_mode_switch.get(mode) is not None and mode != "dymshape" and mode != "dymdims": + self.infer_mode_switch.get(mode)(shapes, custom_sizes) + elif mode == "dymshape": + if isinstance(custom_sizes, int): + custom_sizes = [custom_sizes] * len(self.get_outputs()) + elif not isinstance(custom_sizes, list): + raise RuntimeError('custom_sizes:{} type:{} invalid'.format( + custom_sizes, type(custom_sizes))) + self.session.set_custom_outsize(custom_sizes) + elif mode == "dymdims": + pass + else: + raise RuntimeError('wrong infer_mode:{}, only support \"static\",\"dymbatch\",\"dymhw\", \ + \"dymdims\",\"dymshape\"'.format(mode)) + outputs = self.session.run_pipeline(self.outputs_names, inputs_list, shapes_list, + mode == 'dymshape', mode == 'dymdims') + for i, output in enumerate(outputs): + outputs[i] = self.convert_tensors_to_arrays(output) + return outputs + + def inner_run(self, in_out_list, get_outputs=False, mem_copy=True): + ''' + Parameters: + in_out_list: relation between current input datas and last output datas + get_outputs: get outputs from device or not + mem_copy: the way inputs get data from outputs + ''' + if (get_outputs): + outputs = self.session.inner_run(in_out_list, self.outputs_names, get_outputs, mem_copy) + return outputs + else: + self.session.inner_run(in_out_list, self.outputs_names, get_outputs, mem_copy) + outputs = None + return outputs + + def first_inner_run(self, feeds, mode='static', custom_sizes=100000): + ''' + Parameters: + feeds: input data + mode: static dymdims dymshapes ... + custom_sizes: must equal to the realsize of outputs + ''' + inputs = [] + shapes = [] + for feed in feeds: + if type(feed) is np.ndarray: + infer_input = feed + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shapes.append(infer_input.shape) + elif type(feed) in NP_TYPE_LIST: + infer_input = np.array(feed) + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shapes.append([feed.size]) + elif hasattr(feed, 'type') and feed.type() in TORCH_TENSOR_LIST: + infer_input = feed.numpy() + if not feed.is_contiguous(): + infer_input = np.ascontiguousarray(infer_input) + shapes.append(infer_input.shape) + else: + raise RuntimeError('type:{} invalid'.format(type(feed))) + basetensor = aclruntime.BaseTensor(infer_input.__array_interface__['data'][0], infer_input.nbytes) + inputs.append(basetensor) + + if self.infer_mode_switch.get(mode) is not None: + self.infer_mode_switch.get(mode)(shapes, custom_sizes) + else: + raise RuntimeError('wrong infer_mode:{}, only support \"static\",\"dymbatch\",\"dymhw\", \ + \"dymdims\",\"dymshape\"'.format(mode)) + + return self.session.first_inner_run(self.outputs_names, inputs) + + def infer_iteration(self, feeds, in_out_list=None, iteration_times=1, mode='static', + custom_sizes=100000, mem_copy=True): + ''' + Parameters: + feeds: input datas + in_out_list: relation between current input datas and last output datas + iteration_times: inner iteration infer loop times + mode: static dymdims dymshape ... + custom_sizes: only dymshape needs + ''' + if not in_out_list: + in_out_list = [] + if len(in_out_list) != len(self.get_inputs()): + raise RuntimeError(f"inputs' amount and length of in_out_list not matched!") + if (iteration_times == 1): + outputs = self.infer(feeds, mode, custom_sizes) + return outputs + else: + self.first_inner_run(feeds, mode, custom_sizes) + for _ in range(iteration_times - 2): + self.inner_run(in_out_list, False, mem_copy) + outputs = self.inner_run(in_out_list, True, mem_copy) + # convert to host tensor + self.convert_tensors_to_host(outputs) + # convert tensor to narray + return self.convert_tensors_to_arrays(outputs) + + def summary(self): + return self.session.sumary() + + def _static_prepare(self, shapes, custom_sizes): + self.set_staticbatch() + + def _dymbatch_prepare(self, shapes, custom_sizes): + indesc = self.get_inputs() + if (len(shapes) != len(indesc)): + raise RuntimeError("input datas and intensors nums not matched!") + for i, shape in enumerate(shapes): + for j, dim in enumerate(shape): + if (indesc[i].shape[j] < 0): + self.set_dynamic_batchsize(dim) + return + if (indesc[i].shape[j] != dim): + raise RuntimeError("input datas and intensors dim not matched!") + raise RuntimeError("not a dymbatch model!") + + def _dymhw_prepare(self, shapes, custom_sizes): + indesc = self.get_inputs() + if (len(shapes) != len(indesc)): + raise RuntimeError("input datas and intensors nums not matched!") + for i, shape in enumerate(shapes): + if (indesc[i].shape[2] < 0 and indesc[i].shape[3] < 0): + self.set_dynamic_hw(shape[2], shape[3]) + return + raise RuntimeError("not a dymhw model!") + + def _dymdims_prepare(self, shapes, custom_sizes): + dym_list = [] + indesc = self.get_inputs() + if (len(shapes) != len(indesc)): + raise RuntimeError("input datas and intensors nums not matched!") + for i, shape in enumerate(shapes): + str_shape = [str(val) for val in shape] + dyshape = "{}:{}".format(indesc[i].name, ",".join(str_shape)) + dym_list.append(dyshape) + dyshapes = ';'.join(dym_list) + self.session.set_dynamic_dims(dyshapes) + + def _dymshape_prepare(self, shapes, custom_sizes): + dym_list = [] + indesc = self.get_inputs() + if (len(shapes) != len(indesc)): + raise RuntimeError("input datas and intensors nums not matched!") + outdesc = self.get_outputs() + for i, shape in enumerate(shapes): + str_shape = [str(val) for val in shape] + dyshape = "{}:{}".format(indesc[i].name, ",".join(str_shape)) + dym_list.append(dyshape) + dyshapes = ';'.join(dym_list) + self.session.set_dynamic_shape(dyshapes) + if isinstance(custom_sizes, int): + custom_sizes = [custom_sizes] * len(outdesc) + elif not isinstance(custom_sizes, list): + raise RuntimeError('custom_sizes:{} type:{} invalid'.format( + custom_sizes, type(custom_sizes))) + self.session.set_custom_outsize(custom_sizes) + + +class MultiDeviceSession(): + def __init__(self, model_path: str, acl_json_path: str = None, debug: bool = False, loop: int = 1): + self.model_path = model_path + self.acl_json_path = acl_json_path + self.debug = debug + self.loop = loop + self.summary = {} + + @classmethod + def print_subprocess_run_error(cls, value): + logger.error(f"subprocess run failed error_callback:{value}") + + def summary(self): + return self.summary + + def infer(self, device_feeds:dict, mode='static', custom_sizes=100000): + ''' + Parameters: + device_feeds: device match [input datas1, input datas2...] (Dict) + ''' + subprocess_num = 0 + for _, device in device_feeds.items(): + subprocess_num += len(device) + p = Pool(subprocess_num) + outputs_queue = Manager().Queue() + for device_id, feeds in device_feeds.items(): + for feed in feeds: + p.apply_async( + self.subprocess_infer, + args=(outputs_queue, device_id, feed, mode, custom_sizes), + error_callback=self.print_subprocess_run_error + ) + p.close() + p.join() + result = 0 if 2 * len(device_feeds) == outputs_queue.qsize() else 1 + logger.info(f"multidevice run end qsize:{outputs_queue.qsize()} result:{result}") + outputs_dict = {} + self.summary.clear() + while outputs_queue.qsize() != 0: + ret = outputs_queue.get() + if type(ret) == list: + if (not outputs_dict.get(ret[0])): + outputs_dict.update({ret[0]: []}) + self.summary.update({ret[0]: []}) + outputs_dict.get(ret[0]).append(ret[1]) + self.summary.get(ret[0]).append((ret[3] - ret[2]) * 1000) + logger.info(f"device {ret[0]}, start_time:{ret[2]}, end_time:{ret[3]}") + return outputs_dict + + def infer_pipeline(self, device_feeds_list:dict, mode='static', custom_sizes=100000): + ''' + Parameters: + device_feeds: device match [input datas1, input datas2...] (Dict) + ''' + subprocess_num = 0 + for _, device in device_feeds_list.items(): + subprocess_num += len(device) + p = Pool(subprocess_num) + outputs_queue = Manager().Queue() + for device_id, feeds in device_feeds_list.items(): + for feed in feeds: + p.apply_async( + self.subprocess_infer_pipeline, + args=(outputs_queue, device_id, feed, mode, custom_sizes), + error_callback=self.print_subprocess_run_error + ) + p.close() + p.join() + result = 0 if 2 * len(device_feeds_list) == outputs_queue.qsize() else 1 + logger.info(f"multidevice run pipeline end qsize:{outputs_queue.qsize()} result:{result}") + outputs_dict = {} + self.summary.clear() + while outputs_queue.qsize() != 0: + ret = outputs_queue.get() + if type(ret) == list: + if (not outputs_dict.get(ret[0])): + outputs_dict.update({ret[0]: []}) + self.summary.update({ret[0]: []}) + outputs_dict.get(ret[0]).append(ret[1]) + self.summary.get(ret[0]).append((ret[3] - ret[2]) * 1000) + logger.info(f"device {ret[0]}, start_time:{ret[2]}, end_time:{ret[3]}") + return outputs_dict + + def infer_iteration(self, device_feeds:dict, in_out_list=None, iteration_times=1, mode='static', custom_sizes=None, mem_copy=True): + ''' + Parameters: + device_feeds: device match [input datas1, input datas2...] (Dict) + ''' + subprocess_num = 0 + for _, device in device_feeds.items(): + subprocess_num += len(device) + p = Pool(subprocess_num) + outputs_queue = Manager().Queue() + for device_id, feeds in device_feeds.items(): + for feed in feeds: + p.apply_async( + self.subprocess_infer_iteration, + args=(outputs_queue, device_id, feed, in_out_list, iteration_times, mode, custom_sizes, mem_copy), + error_callback=self.print_subprocess_run_error + ) + p.close() + p.join() + result = 0 if 2 * len(device_feeds) == outputs_queue.qsize() else 1 + logger.info(f"multidevice run iteration end qsize:{outputs_queue.qsize()} result:{result}") + outputs_dict = {} + self.summary.clear() + while outputs_queue.qsize() != 0: + ret = outputs_queue.get() + if type(ret) == list: + if (not outputs_dict.get(ret[0])): + outputs_dict.update({ret[0]: []}) + self.summary.update({ret[0]: []}) + outputs_dict.get(ret[0]).append(ret[1]) + self.summary.get(ret[0]).append((ret[3] - ret[2]) * 1000) + logger.info(f"device {ret[0]}, start_time:{ret[2]}, end_time:{ret[3]}") + return outputs_dict + + def subprocess_infer(self, outputs_queue, device_id, feeds, mode='static', custom_sizes=100000): + sub_session = InferSession( + device_id=device_id, + model_path=self.model_path, + acl_json_path=self.acl_json_path, + debug=self.debug, + loop=self.loop + ) + start_time = time.time() + outputs = sub_session.infer(feeds, mode, custom_sizes, out_array=True) + end_time = time.time() + outputs_queue.put([device_id, outputs, start_time, end_time]) + return + + def subprocess_infer_pipeline(self, outputs_queue, device_id, feeds_list, mode='static', custom_sizes=100000): + sub_session = InferSession( + device_id=device_id, + model_path=self.model_path, + acl_json_path=self.acl_json_path, + debug=self.debug, + loop=self.loop + ) + start_time = time.time() + outputs = sub_session.infer_pipeline(feeds_list, mode, custom_sizes) + end_time = time.time() + outputs_queue.put([device_id, outputs, start_time, end_time]) + return + + def subprocess_infer_iteration(self, outputs_queue, device_id, feeds, in_out_list=None, + iteration_times=1, mode='static', custom_sizes=None, mem_copy=True): + sub_session = InferSession( + device_id=device_id, + model_path=self.model_path, + acl_json_path=self.acl_json_path, + debug=self.debug, + loop=self.loop + ) + start_time = time.time() + outputs = sub_session.infer_iteration(feeds, in_out_list, iteration_times, mode, custom_sizes, mem_copy) + end_time = time.time() + outputs_queue.put([device_id, outputs, start_time, end_time]) + return + + +class MemorySummary: + @staticmethod + def get_h2d_time_list(): + if hasattr(aclruntime, 'MemorySummary'): + return aclruntime.MemorySummary().H2D_time_list + else: + return [] + + @staticmethod + def get_d2h_time_list(): + if hasattr(aclruntime, 'MemorySummary'): + return aclruntime.MemorySummary().D2H_time_list + else: + return [] + + @staticmethod + def reset(): + if hasattr(aclruntime, 'MemorySummary'): + aclruntime.MemorySummary().reset() diff --git a/tools/infer_tool/ais_bench/infer/registry.py b/tools/infer_tool/ais_bench/infer/registry.py new file mode 100644 index 0000000000000000000000000000000000000000..60f4784c6fb674bd413411154290fbb1dcf774c3 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/registry.py @@ -0,0 +1,103 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import logging +from typing import Any, Dict, Iterable, Iterator, Tuple +from ais_bench.infer.common.utils import logger + + +class Registry(Iterable[Tuple[str, Any]]): + """ + The registry that provides name -> object mapping, to support third-party + users' custom modules. + """ + def register(self, obj: Any = None) -> Any: + """ + Register the given object under the the name `obj.__name__`. + Can be used as either a decorator or not.See docstring of this class for usage. + """ + if callable(obj): + return add(None, obj) + + def add(name: str, obj: Any) -> Any: + self[name] = obj + return obj + + return lambda x: add(obj, x) + + def __init__(self, name: str) -> None: + """ + Args: + name (str): the name of this registry + """ + self._name: str = name + self._obj_map: Dict[str, Any] = {} + + def __setitem__(self, name: str, obj: Any) -> None: + if not callable(obj): + raise ValueError("Value of a Registry must be a callable!") + + if name is None: + name = obj.__name__ + + if name in self._obj_map: + raise ValueError( + f"An object named '{name}' was already registered in '{self._name}' registry!" + ) + self._obj_map[name] = obj + + def __getitem__(self, name: str) -> Any: + return self._obj_map[name] + + def __call__(self, obj: Any) -> Any: + return self.register(obj) + + def __contains__(self, name: str) -> bool: + return name in self._obj_map + + def __repr__(self) -> str: + from tabulate import tabulate + + table_headers = ["Names", "Objects"] + table = tabulate( + self._obj_map.items(), headers=table_headers, tablefmt="fancy_grid" + ) + return "Registry of {}:\n".format(self._name) + table + + def __iter__(self) -> Iterator[Tuple[str, Any]]: + return iter(self._obj_map.items()) + + +def import_all_modules_for_register(module_paths, base_model_name): + import os + import importlib + + modules = [] + for _, _, files in os.walk(module_paths): + for filename in files: + if not filename.endswith(".py") or filename == "__init__.py": + continue + model_name = base_model_name + "." + filename.rsplit(".", 1)[0] + modules.append(model_name) + + errors = [] + for module in modules: + try: + importlib.import_module(module) + except ImportError as e: + errors.append((module, e)) + logger.info(f"import {module} error: {e}") + + return errors \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/summary.py b/tools/infer_tool/ais_bench/infer/summary.py new file mode 100644 index 0000000000000000000000000000000000000000..65d1cb8e8729e8cad6dcbc0e6ed2e40b406a6a19 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/summary.py @@ -0,0 +1,229 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import json +import os +import stat + +import numpy as np +from ais_bench.infer.common.utils import logger +from ais_bench.infer.common.path_security_check import ms_open + + +class ListInfo(object): + def __init__(self): + self.min = 0.0 + self.max = 0.0 + self.mean = 0.0 + self.median = 0.0 + self.percentile = 0.0 + + +class Result(object): + def __init__(self): + self.npu_compute_time = None + self.h2d_latency = None + self.d2h_latency = None + self.throughput = None + self.scale = None + self.batchsize = None + + +class Summary(object): + def __init__(self): + self.reset() + self.infodict = {"filesinfo": {}} + + @staticmethod + def merge_intervals(intervals): + intervals.sort(key=lambda x: x[0]) + merged = [] + for interval in intervals: + if not merged or merged[-1][1] < interval[0]: + merged.append(list(interval)) + else: + merged[-1][1] = max(merged[-1][1], interval[1]) + return merged + + @staticmethod + def get_list_info(work_list, percentile_scale, merge=False): + list_info = ListInfo() + if merge: # work_list is a 2-dim vector each element is a pair containing start and end time + n = len(work_list) + if n == 0: + raise RuntimeError(f'summary.get_list_info failed: inner error') + merged_intervals = Summary.merge_intervals(work_list) + sum_time = sum(end_time - start_time for start_time, end_time in merged_intervals) + list_info.mean = sum_time / n + + elif len(work_list) != 0: + list_info.min = np.min(work_list) + list_info.max = np.max(work_list) + list_info.mean = np.mean(work_list) + list_info.median = np.median(work_list) + list_info.percentile = np.percentile(work_list, percentile_scale) + + return list_info + + def reset(self): + self.h2d_latency_list = [] + self.d2h_latency_list = [] + self.npu_compute_time_list = [] + self.npu_compute_time_interval_list = [] + self._batchsizes = [] + + def add_batchsize(self, n: int): + self._batchsizes.append(n) + + def add_sample_id_infiles(self, sample_id, infiles): + if self.infodict["filesinfo"].get(sample_id) is None: + self.infodict["filesinfo"][sample_id] = {"infiles": [], "outfiles": []} + if len(self.infodict["filesinfo"][sample_id]["infiles"]) == 0: + for files in infiles: + self.infodict["filesinfo"][sample_id]["infiles"].append(files) + + def append_sample_id_outfile(self, sample_id, outfile): + if self.infodict["filesinfo"].get(sample_id) is None: + self.infodict["filesinfo"][sample_id] = {"infiles": [], "outfiles": []} + self.infodict["filesinfo"][sample_id]["outfiles"].append(outfile) + + def add_args(self, args): + self.infodict["args"] = args + + def record(self, result, multi_threads=False): + if multi_threads: + self.infodict['NPU_compute_time'] = { + "mean": result.npu_compute_time.mean, + "count": len(self.npu_compute_time_interval_list), + } + self.infodict['H2D_latency'] = {"mean": result.h2d_latency.mean, "count": len(self.h2d_latency_list)} + self.infodict['D2H_latency'] = {"mean": result.d2h_latency.mean, "count": len(self.d2h_latency_list)} + self.infodict['npu_compute_time_list'] = self.npu_compute_time_interval_list + else: + self.infodict['NPU_compute_time'] = { + "min": result.npu_compute_time.min, + "max": result.npu_compute_time.max, + "mean": result.npu_compute_time.mean, + "median": result.npu_compute_time.median, + "percentile({}%)".format(result.scale): result.npu_compute_time.percentile, + "count": len(self.npu_compute_time_list), + } + self.infodict['H2D_latency'] = { + "min": result.h2d_latency.min, + "max": result.h2d_latency.max, + "mean": result.h2d_latency.mean, + "median": result.h2d_latency.median, + "percentile({}%)".format(result.scale): result.h2d_latency.percentile, + "count": len(self.h2d_latency_list), + } + self.infodict['D2H_latency'] = { + "min": result.d2h_latency.min, + "max": result.d2h_latency.max, + "mean": result.d2h_latency.mean, + "median": result.d2h_latency.median, + "percentile({}%)".format(result.scale): result.d2h_latency.percentile, + "count": len(self.d2h_latency_list), + } + self.infodict['npu_compute_time_list'] = self.npu_compute_time_list + self.infodict['throughput'] = result.throughput + self.infodict['pid'] = os.getpid() + + def display(self, result, display_all_summary, multi_threads): + logger.info("-----------------Performance Summary------------------") + if multi_threads: + if display_all_summary is True: + logger.info("H2D_latency (ms): mean = {0}".format(result.h2d_latency.mean)) + logger.info("NPU_compute_time (ms): mean = {0}".format(result.npu_compute_time.mean)) + if display_all_summary is True: + logger.info("D2H_latency (ms): mean = {0}".format(result.d2h_latency.mean)) + else: + if display_all_summary is True: + logger.info( + "H2D_latency (ms): min = {0}, max = {1}, mean = {2}, median = {3}, percentile({4}%) = {5}".format( + result.h2d_latency.min, + result.h2d_latency.max, + result.h2d_latency.mean, + result.h2d_latency.median, + result.scale, + result.h2d_latency.percentile, + ) + ) + + logger.info( + "NPU_compute_time (ms): min = {0}, max = {1}, mean = {2}, median = {3}, percentile({4}%) = {5}".format( + result.npu_compute_time.min, + result.npu_compute_time.max, + result.npu_compute_time.mean, + result.npu_compute_time.median, + result.scale, + result.npu_compute_time.percentile, + ) + ) + if display_all_summary is True: + logger.info( + "D2H_latency (ms): min = {0}, max = {1}, mean = {2}, median = {3}, percentile({4}%) = {5}".format( + result.d2h_latency.min, + result.d2h_latency.max, + result.d2h_latency.mean, + result.d2h_latency.median, + result.scale, + result.d2h_latency.percentile, + ) + ) + logger.info( + "throughput 1000*batchsize.mean({})/NPU_compute_time.mean({}): {}".format( + result.batchsize, result.npu_compute_time.mean, result.throughput + ) + ) + logger.info("------------------------------------------------------") + + def report(self, batchsize, output_prefix, display_all_summary=False, multi_threads=False): + scale = 99 + + if self.npu_compute_time_list and self.npu_compute_time_interval_list: + logger.error("npu_compute_time_list and npu_compute_time_interval_list exits at the same time") + raise Exception + if self.npu_compute_time_list: + npu_compute_time = Summary.get_list_info(self.npu_compute_time_list, scale) + else: + npu_compute_time = Summary.get_list_info(self.npu_compute_time_interval_list, scale, True) + h2d_latency = Summary.get_list_info(self.h2d_latency_list, scale) + d2h_latency = Summary.get_list_info(self.d2h_latency_list, scale) + if self._batchsizes: + batchsize = sum(self._batchsizes) / len(self._batchsizes) + else: + pass + if npu_compute_time.mean == 0: + throughput = 0 + else: + throughput = 1000 * batchsize / npu_compute_time.mean + + result = Result() + result.npu_compute_time = npu_compute_time + result.d2h_latency = d2h_latency + result.h2d_latency = h2d_latency + result.throughput = throughput + result.scale = scale + result.batchsize = batchsize + + self.record(result, multi_threads) + self.display(result, display_all_summary, multi_threads) + + if output_prefix is not None: + with ms_open(output_prefix + "_summary.json", mode="w") as f: + json.dump(self.infodict, f) + + +summary = Summary() diff --git a/tools/infer_tool/requirements.txt b/tools/infer_tool/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..e6094bfabeeab6ceaaa636f38b2f618ee081d560 --- /dev/null +++ b/tools/infer_tool/requirements.txt @@ -0,0 +1,3 @@ +numpy +tqdm +attrs >= 21.3.0 \ No newline at end of file diff --git a/tools/infer_tool/setup.py b/tools/infer_tool/setup.py new file mode 100644 index 0000000000000000000000000000000000000000..df023b7ee2daf2f4c72e96d58311024111f40618 --- /dev/null +++ b/tools/infer_tool/setup.py @@ -0,0 +1,51 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import subprocess +from setuptools import setup, find_packages # type: ignore + + +with open('requirements.txt', encoding='utf-8') as f: + required = f.read().splitlines() + +with open('README.md', encoding='utf-8') as f: + long_description = f.read() + +# 使用Git命令获取最新的提交哈希 +try: + git_hash = subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('utf-8').strip() +except Exception: + git_hash = "" +# 使用Git命令获取最新的提交日期和时间 +try: + git_date = subprocess.check_output(['git', 'show', '-s', '--format=%cd', 'HEAD']).decode('utf-8').strip() +except Exception: + git_date = "" + +setup( + name='ais_bench', + version='0.0.2', + description='ais_bench tool', + long_description=long_description, + url=f"https://gitee.com/ascend/tools/, commit id: {git_hash}, release_date: {git_date}", + release_date = git_date, + packages=find_packages(), + include_package_data=True, + keywords='ais_bench tool', + install_requires=required, + python_requires='>=3.7', + entry_points={ + 'benchmark_sub_task': ['benchmark=ais_bench.infer.main_cli:get_cmd_instance'], + }, + +) \ No newline at end of file