From 91b61d1b25588f536f5647a320bc84c66947a837 Mon Sep 17 00:00:00 2001
From: wuhongfa <1660398197@qq.com>
Date: Wed, 24 Jul 2024 15:42:37 +0000
Subject: [PATCH 1/2] =?UTF-8?q?=E3=80=90FEAT=E3=80=91=E6=96=B0=E5=A2=9Eatt?=
 =?UTF-8?q?ention=20grad=E8=9E=8D=E5=90=88=E7=AE=97=E5=AD=90=20PART2?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 cust_op/attention_fusion_grad/README.md       | 175 +++++++++++++++++
 .../aclnn_attention_fusion_grad/inc/common.h  |  49 +++++
 .../inc/op_runner.h                           | 184 ++++++++++++++++++
 .../inc/operator_desc.h                       |  59 ++++++
 .../aclnn_attention_fusion_grad/run.sh        |  91 +++++++++
 .../attention_fusion_grad.json                |  90 +++++++++
 cust_op/attention_fusion_grad/creat.sh        |  56 ++++++
 7 files changed, 704 insertions(+)
 create mode 100644 cust_op/attention_fusion_grad/README.md
 create mode 100644 cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/common.h
 create mode 100644 cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/op_runner.h
 create mode 100644 cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/operator_desc.h
 create mode 100755 cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/run.sh
 create mode 100644 cust_op/attention_fusion_grad/attention_fusion_grad.json
 create mode 100755 cust_op/attention_fusion_grad/creat.sh

diff --git a/cust_op/attention_fusion_grad/README.md b/cust_op/attention_fusion_grad/README.md
new file mode 100644
index 00000000..69f4b57e
--- /dev/null
+++ b/cust_op/attention_fusion_grad/README.md
@@ -0,0 +1,175 @@
+# LazyAdam优化器融合算子及样例说明
+
+## LazyAdam融合算子文件结构
+
+```shell
+├── aclnn_attention_fusion_grad  # 单算子测试用例
+├── attention_fusion_grad.json    # 算子原型配置
+├── op_host    # AttentionGrad融合算子Host侧实现
+├── op_kernel  # AttentionGrad融合算子Kernel侧实现
+├── README.md  # AttentionGrad融合算子说明文档
+└── creat.sh     # AttentionGrad融合算子安装脚本
+```
+
+## Ascend C参考设计
+
+更多详情可以参考CANN官方的Ascend
+C算子开发手册[Ascend C算子开发](https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/operatordev/Ascendcopdevg/atlas_ascendc_10_0001.html)。
+
+## LazyAdam融合算子使用
+
+1. 上传fused_lazy_adam文件夹到目标环境，并进入当前目录，执行指令对lazy_adam融合算子进行编译和部署
+
+```shell
+bash creat.sh
+```
+
+注：需先在环境中设置CANN相关环境变量，再执行算子编译和安装指令。使用默认路径安装CANN时设置环境变量指令如下：
+
+```shell
+source /usr/local/Ascend/ascend-toolkit/set_env.sh
+```
+
+2. 模型脚本中创建lazy_adam优化器并指定使用融合算子实现。代码示例：
+
+```python
+from mx_rec.optimizers.lazy_adam import create_hash_optimizer
+
+# 创建lazy_adam优化器时增加"use_fusion_optim=True"参数，表示使用融合算子实现。use_fusion_optim参数默认值为False。
+# lazy_adam优化器详细使用指导请参考mxRec用户指南。
+sparse_optimizer = create_hash_optimizer(learning_rate=0.001, use_fusion_optim=True)
+```
+
+## LazyAdam融合算子介绍
+
+1. 算子分析
+
+a) 算子的主要功能是实现Attention的反向计算；  
+b) 算子输入说明：
+* dout: 前向算子反向传播的梯度;
+* softmax_out: 前向softmax的输出；
+* query: query矩阵；
+* key: key矩阵；
+* value: value矩阵；
+
+c) 算子输出说明：
+* grad_query: query的反向；
+* grad_key: key的反向；
+* grad_value: value的反向；
+
+d) 算子约束说明：
+* 支持的型号：Atlas A2系列产品;
+* 支持的CANN版本：8.0.RC1及之后版本；
+* 支持的输入数据类型：float32；
+* 输入的数据的batch size均相等, 且值在(0, 2000)
+* 输入的数据的满足attention的公式，shape支持对应的matmul计算
+* 输入的数据除batch size外，所有的维度满足(0, 1000)
+* 融合算子在key的第2维度为8的倍数且较长如大于500是计算性能较好
+* 融合算子的性能提升适用于小算子间free时间较长的情况
+
+2. Host侧算子实现
+
+Host侧算子实现在目录 op_host下
+
+a) Tiling实现
+
+namespace optiling域中的LazyAdamTilingFunc函数，主要实现从context中获取外部入参信息（输入参数指针、shape信息），及校验有效性；  
+并计算kernel侧需要的数据切分相关参数，包括softmax、matmul、ub大小、batch等（详情见tiling文件注释），设置BlockDim，最后通过TilingData传递属性信息。
+
+b) Shape推导
+
+推导输出的rShape和DataType函数体。
+
+c) 原型注册
+
+定义了算子原型，并将算子注册到GE。
+
+3. Kernel侧算子实现
+
+Kernel侧算子实现在目录op_kernel下，其中包括：attention_fusion_grad.cpp。
+
+a) 核函数的入口：extern "C" __global__ __aicore__ void attention_fusion_grad
+
+b) 解析tiling参数：GET_TILING_DATA(tilingData, tiling)从TilingData中获取host侧传入的数据
+
+c) 调用AttentionFusionGradKernel完成计算；
+
+## AclNN单算子测试参考设计
+
+更多详情可以参考CANN官方的[Ascend C单算子调用概述](https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/operatordev/Ascendcopdevg/atlas_ascendc_10_0036.html)。
+
+单算子调用分为两种方式：单算子API执行和模型执行。mxRec提供单算子API执行供参考。
+
+单算子测试用例在目录aclnn_attention_fusion_grad下，其中：
+
+* inc是头文件目录
+* scripts存放生成数据和验证数据的python脚本
+* input是存放算子入参的bin文件
+* output是存放生成的可执行程序execute_op、算子输出bin文件和用于验证的golden数据bin文件
+* src是存放公共函数common、构造算子输入输出描述类oprator_desc、单算子调用主体流程实现op_runner文件和入口main文件
+
+执行单算子测试：
+
+```shell
+bash run.sh
+```
+
+### 前置条件
+
+1.
+
+参考[基于msopgen工具创建算子工程](https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/operatordev/Ascendcopdevg/atlas_ascendc_10_0023.html)
+完成算子工程的创建，
+参考[kernel侧算子实现](https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/operatordev/Ascendcopdevg/atlas_ascendc_10_0024.html)
+完成kernel侧实现的相关准备，
+参考[host侧算子实现](https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/operatordev/Ascendcopdevg/atlas_ascendc_10_0026.html)
+完成host侧实现相关准备。
+
+2.
+
+参考[算子编译部署](https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/operatordev/Ascendcopdevg/atlas_ascendc_10_0031.html)
+完成算子的编译部署，编译部署时需要开启算子的二进制编译功能：修改算子工程中的编译配置项文件CMakePresets.json，将
+ENABLE_BINARY_PACKAGE设置为True。编译部署时可将算子的二进制部署到当前环境，便于后续算子的调用。
+
+### Attention融合算子的AclNN调用实现
+
+调用入口在src/main.cpp中：
+
+1. InitResource函数：初始化AscendCL并运行管理资源申请，不用修改
+2. RunLookupOp运行算子：
+
+a) 创建算子输入输出描述CreateOpDesc，OperatorDesc对象定义(inc/operator_desc.h)中设置了算子入参为成员变量，以便后续
+op_runner中使用；
+
+b) 创建OpRunner的对象，并依次执行：
+
+* opRunner.Init()：申请内存存放执行算子的输入输出数据
+* SetInputData()：加载数据输入bin文件并传输给OpRunner的Buffer供后续算子执行使用
+* opRunner.RunOp()：算子执行，主要流程为：入参数据拷贝，创建Stream，执行Stream，输出数据拷贝，释放Stream资源
+* ProcessOutputData()：算子输出数据处理，并落盘文件，以供后续与golden数据比对
+
+3. DestroyResource函数：释放内存，不用修改
+
+### 运行脚本
+
+run.sh脚本依次执行：
+
+1. 清除遗留生成文件和日志文件
+2. 生成输入数据和真值数据
+3. 编译acl可执行文件
+4. 运行可执行文件
+5. 比较真值文件
+
+### scripts脚本
+
+* gen_data.py：生成LazyAdam融合算子的输入数据和用于精度校验的golden数据，可自行修改测试相关dim参数。
+* verify_result.py：将算子的输出和脚本生成的golden数据进行精度比对，并输出比较结果。比对规则为：允许误差精度loss：1e-4
+
+a) 绝对误差
+b) 相对误差
+c) 误差相对个数
+
+同时满足绝对误差不全小于loss，相对误差不全小于loss，且绝对误差和相对误差大于loss的个数都超过总数的1/loss，也就是
+1/10000（万分之一），即认为算子精度不达标。其余情况均认为算子达标。
+
+用户可自行修改允许精度误差范围loss。
\ No newline at end of file
diff --git a/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/common.h b/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/common.h
new file mode 100644
index 00000000..225d5994
--- /dev/null
+++ b/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/common.h
@@ -0,0 +1,49 @@
+/**
+ * @file common.h
+ *
+ * Copyright (C) 2024. Huawei Technologies Co., Ltd. All rights reserved.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ */
+#ifndef COMMON_H
+#define COMMON_H
+
+#include <cstdio>
+#include <iomanip>
+#include <iostream>
+#include <string>
+#include <vector>
+
+#include "acl/acl.h"
+
+#define SUCCESSED 0
+#define FAILED 1
+
+#define INFO_LOG(fmt, args...) fprintf(stdout, "[INFO]  " fmt "\n", ##args)
+#define WARN_LOG(fmt, args...) fprintf(stdout, "[WARN]  " fmt "\n", ##args)
+#define ERROR_LOG(fmt, args...) fprintf(stderr, "[ERROR]  " fmt "\n", ##args)
+
+namespace AttentionFusionGrad {
+constexpr int NUM_TEST_EXEC = 100;
+constexpr int TIME_OUT = 5000;
+/**
+ * @brief Read data from file
+ * @param [in] filePath: file path
+ * @param [out] fileSize: file size
+ * @return read result
+ */
+bool ReadFile(const std::string& filePath, size_t fileSize, void* buffer, size_t bufferSize);
+
+/**
+ * @brief Write data to file
+ * @param [in] filePath: file path
+ * @param [in] buffer: data to write to file
+ * @param [in] size: size to write
+ * @return write result
+ */
+bool WriteFile(const std::string& filePath, const void* buffer, size_t size);
+}  // namespace AttentionFusionGrad
+
+#endif  // COMMON_H
diff --git a/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/op_runner.h b/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/op_runner.h
new file mode 100644
index 00000000..8005fa64
--- /dev/null
+++ b/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/op_runner.h
@@ -0,0 +1,184 @@
+/**
+* @file op_runner.h
+*
+* Copyright (C) 2024. Huawei Technologies Co., Ltd. All rights reserved.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+*/
+#ifndef OP_RUNNER_H
+#define OP_RUNNER_H
+
+#include "aclnn/acl_meta.h"
+#include "acl/acl.h"
+#include "common.h"
+#include "operator_desc.h"
+
+namespace AttentionFusionGrad {
+/**
+ * Op Runner
+ */
+class OpRunner {
+public:
+    /**
+     * @brief Constructor
+     * @param [in] opDesc: op description
+     */
+    explicit OpRunner(OperatorDesc* opDesc);
+
+    /**
+     * @brief Destructor
+     */
+    virtual ~OpRunner();
+
+    /**
+    * @brief Init op runner
+    */
+    bool Init();
+
+    /**
+     * @brief Get number of inputs
+     * @return number of inputs
+     */
+    const size_t NumInputs();
+
+    /**
+     * @brief Get number of outputs
+     * @return number of outputs
+     */
+    const size_t NumOutputs();
+
+    /**
+     * @brief Get input size by index
+     * @param [in] index: input index
+     * @return size of the input
+     */
+    const size_t GetInputSize(size_t index) const;
+    const size_t GetInputNumDims(size_t index) const;
+    aclDataType GetInputDataType(size_t index) const;
+    aclFormat GetInputFormat(size_t index) const;
+
+    /**
+     * @brief Get output size by index
+     * @param [in] index: output index
+     * @return size of the output
+     */
+    size_t GetOutputSize(size_t index) const;
+    const size_t GetOutputNumDims(size_t index) const;
+    aclDataType GetOutputDataType(size_t index) const;
+    aclFormat GetOutputFormat(size_t index) const;
+
+    /**
+     * @brief Get input element count by index
+     * @param i[in] ndex: input index
+     * @return element count of the input
+     */
+    size_t GetInputElementCount(size_t index) const;
+
+    /**
+     * @brief Get output element count by index
+     * @param [in] index: output index
+     * @return element count of the output
+     */
+    size_t GetOutputElementCount(size_t index) const;
+
+    /**
+     * @brief Get input shape by index
+     * @param [in] index: input index
+     * @return shape of the output
+     */
+    std::vector<int64_t> GetInputShape(size_t index) const;
+
+    /**
+     * @brief Get output shape by index
+     * @param [in] index: output index
+     * @return shape of the output
+     */
+    std::vector<int64_t> GetOutputShape(size_t index) const;
+
+    /**
+     * @brief Get input buffer(host memory) by index
+     * @tparam T: data type
+     * @param [in] index: input index
+     * @return host address of the input
+     */
+    template<typename T>
+    T *GetInputBuffer(size_t index)
+    {
+        if (index >= numInputs_) {
+            ERROR_LOG("Index out of range. index = %zu, numInputs = %zu", index, numInputs_);
+            return nullptr;
+        }
+        return reinterpret_cast<T *>(hostInputs_[index]);
+    }
+
+    /**
+     * @brief Get output buffer(host memory) by index
+     * @tparam T: data type
+     * @param [in] index: output index
+     * @return host address of the output
+     */
+    template<typename T>
+    const T *GetOutputBuffer(size_t index)
+    {
+        if (index >= numOutputs_) {
+            ERROR_LOG("Index out of range. index = %zu, numOutputs = %zu", index, numOutputs_);
+            return nullptr;
+        }
+
+        return reinterpret_cast<T *>(hostOutputs_[index]);
+    }
+
+     /**
+      * @brief Print readable input by index
+      * @param [in] index: input index
+      * @param [in] elementsPerRow: number of elements per row
+      */
+    void PrintInput(size_t index, size_t elementsPerRow = 16);
+
+    /**
+      * @brief Print readable output by index
+      * @param [in] index: output index
+      * @param [in] elementsPerRow: number of elements per row
+      */
+    void PrintOutput(size_t index, size_t elementsPerRow = 16);
+
+    /**
+     * @brief Compile static op
+     * @return compile result
+     */
+    bool CompileStaticOp();
+
+    /**
+     * @brief Compile dynamic op
+     * @return compile result
+     */
+    bool CompileDynamicOp();
+
+    /**
+     * @brief Run op
+     * @return run result
+     */
+    bool RunOp();
+
+private:
+    size_t numInputs_;
+    size_t numOutputs_;
+
+    std::vector<aclDataBuffer *> inputBuffers_;
+    std::vector<aclDataBuffer *> outputBuffers_;
+
+    std::vector<void *> devInputs_;
+    std::vector<void *> devOutputs_;
+
+    std::vector<void *> hostInputs_;
+    std::vector<void *> hostOutputs_;
+
+    std::vector<aclTensor *> inputTensor_;
+    std::vector<aclTensor *> outputTensor_;
+    OperatorDesc *opDesc_;
+};
+}
+
+#endif // OP_RUNNER_H
diff --git a/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/operator_desc.h b/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/operator_desc.h
new file mode 100644
index 00000000..a643d1d8
--- /dev/null
+++ b/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/operator_desc.h
@@ -0,0 +1,59 @@
+/**
+ * @file operator_desc.h
+ *
+ * Copyright (C) 2024. Huawei Technologies Co., Ltd. All rights reserved.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ */
+#ifndef OPERATOR_DESC_H
+#define OPERATOR_DESC_H
+
+#include <string>
+#include <vector>
+
+#include "acl/acl.h"
+
+namespace AttentionFusionGrad {
+/**
+ * Op description
+ */
+struct OperatorDesc {
+    /**
+     * Constructor
+     */
+    explicit OperatorDesc();
+
+    /**
+     * Destructor
+     */
+    virtual ~OperatorDesc();
+
+    /**
+     * Add an input tensor description
+     * @param [in] dataType: data type
+     * @param [in] numDims: number of dims
+     * @param [in] dims: dims
+     * @param [in] format: format
+     * @return OperatorDesc
+     */
+    OperatorDesc& AddInputTensorDesc(aclDataType dataType, int numDims, const int64_t* dims, aclFormat format);
+
+    /**
+     * Add an output tensor description
+     * @param [in] dataType: data type
+     * @param [in] numDims: number of dims
+     * @param [in] dims: dims
+     * @param [in] format: format
+     * @return OperatorDesc
+     */
+    OperatorDesc& AddOutputTensorDesc(aclDataType dataType, int numDims, const int64_t* dims, aclFormat format);
+
+    std::string opType;
+    std::vector<aclTensorDesc*> inputDesc;
+    std::vector<aclTensorDesc*> outputDesc;
+};
+}  // namespace AttentionFusionGrad
+
+#endif  // OPERATOR_DESC_H
diff --git a/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/run.sh b/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/run.sh
new file mode 100755
index 00000000..6793de82
--- /dev/null
+++ b/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/run.sh
@@ -0,0 +1,91 @@
+#!/bin/bash
+export ASCEND_SLOG_PRINT_TO_STDOUT=0
+export ASCEND_GLOBAL_LOG_LEVEL=0
+
+CURRENT_DIR=$(
+    cd $(dirname ${BASH_SOURCE:-$0})
+    pwd
+)
+cd $CURRENT_DIR
+
+SHORT=v:,
+LONG=dtype:,
+OPTS=$(getopt -a --options $SHORT --longoptions $LONG -- "$@")
+eval set -- "$OPTS"
+while :
+do
+    case "$1" in
+        # float16, float, int32
+        (-v | --dtype)
+            DTYPE="$2"
+            shift 2;;
+        (--)
+            shift;
+            break;;
+        (*)
+            echo "[ERROR] Unexpected option: $1";
+            break;;
+    esac
+done
+
+if [ ! $ASCEND_HOME_DIR ]; then
+    if [ -d "$HOME/Ascend/ascend-toolkit/latest" ]; then
+        export ASCEND_HOME_DIR=$HOME/Ascend/ascend-toolkit/latest
+    else
+        export ASCEND_HOME_DIR=/usr/local/Ascend/ascend-toolkit/latest
+    fi
+fi
+
+export DDK_PATH=$ASCEND_HOME_DIR
+arch=$(uname -m)
+export NPU_HOST_LIB=$ASCEND_HOME_DIR/${arch}-linux/lib64
+
+function main {
+    rm -rf $HOME/ascend/log/*
+    rm ./input/*.bin
+    rm ./output/*.bin
+
+    cd $CURRENT_DIR
+    python3 scripts/gen_data.py
+    if [ $? -ne 0 ]; then
+        echo "ERROR: generate input data failed!"
+        return 1
+    fi
+    echo "INFO: generate input data success!"
+
+    cd $CURRENT_DIR; rm -rf build; mkdir -p build; cd build
+    cmake ../src
+    if [ $? -ne 0 ]; then
+        echo "ERROR: cmake failed!"
+        return 1
+    fi
+    echo "INFO: cmake success!"
+    make
+    if [ $? -ne 0 ]; then
+        echo "ERROR: make failed!"
+        return 1
+    fi
+    echo "INFO: make success!"
+
+    cd $CURRENT_DIR/output
+    echo "INFO: execute op!"
+    ./execute_attention_fusion_grad_op
+
+    if [ $? -ne 0 ]; then
+        echo "ERROR: acl executable run failed! please check your project!"
+        return 1
+    fi
+    echo "INFO: acl executable run success!"
+    cd $CURRENT_DIR
+    ret=`python3 scripts/verify_result.py output/grad_query.bin output/grad_key.bin output/grad_value.bin  output/golden_grad_query.bin output/golden_grad_key.bin output/golden_grad_value.bin  `
+    echo $ret
+    if [ "x$ret" == "xtest pass" ]; then
+        echo ""
+        echo "#####################################"
+        echo "INFO: you have passed the Precision!"
+        echo "#####################################"
+        echo ""
+    fi
+}
+
+main
diff --git a/cust_op/attention_fusion_grad/attention_fusion_grad.json b/cust_op/attention_fusion_grad/attention_fusion_grad.json
new file mode 100644
index 00000000..704568e4
--- /dev/null
+++ b/cust_op/attention_fusion_grad/attention_fusion_grad.json
@@ -0,0 +1,90 @@
+[
+    {
+        "op": "AttentionFusionGrad",
+        "language": "cpp",
+        "input_desc": [
+            {
+                "name": "dout",
+                "param_type": "required",
+                "format": [
+                    "ND"
+                ],
+                "type": [
+                    "float"
+                ]
+            },
+            {
+                "name": "softmax_out",
+                "param_type": "required",
+                "format": [
+                    "ND"
+                ],
+                "type": [
+                    "float"
+                ]
+            },
+            {
+                "name": "query",
+                "param_type": "required",
+                "format": [
+                    "ND"
+                ],
+                "type": [
+                    "float"
+                ]
+            },
+            {
+                "name": "key",
+                "param_type": "optional",
+                "format": [
+                    "ND"
+                ],
+                "type": [
+                    "float"
+                ]
+            },
+            {
+                "name": "value",
+                "param_type": "optional",
+                "format": [
+                    "ND"
+                ],
+                "type": [
+                    "float"
+                ]
+            }
+        ],
+        "output_desc": [
+            {
+                "name": "grad_query",
+                "param_type": "required",
+                "format": [
+                    "ND"
+                ],
+                "type": [
+                    "float"
+                ]
+            },
+            {
+                "name": "grad_key",
+                "param_type": "required",
+                "format": [
+                    "ND"
+                ],
+                "type": [
+                    "float"
+                ]
+            },
+            {
+                "name": "grad_value",
+                "param_type": "required",
+                "format": [
+                    "ND"
+                ],
+                "type": [
+                    "float"
+                ]
+            }
+        ]
+    }
+]
\ No newline at end of file
diff --git a/cust_op/attention_fusion_grad/creat.sh b/cust_op/attention_fusion_grad/creat.sh
new file mode 100755
index 00000000..2c186bbc
--- /dev/null
+++ b/cust_op/attention_fusion_grad/creat.sh
@@ -0,0 +1,56 @@
+#!/bin/bash
+# Copyright 2024. Huawei Technologies Co.,Ltd. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+set -e
+
+
+# 查找msopgen的路径，加入到环境变量PATH中
+msopgen_path=$(find /usr/local/Ascend/ -name msopgen | grep bin)
+parent_dir=$(dirname "$msopgen_path")
+export PATH=$parent_dir:$PATH
+
+# 利用msopgen生成可编译文件
+rm -rf ./attention_fusion_grad
+python3 /usr/local/Ascend/ascend-toolkit/latest/python/site-packages/bin/msopgen gen -i attention_fusion_grad.json -f tf -c ai_core-Ascend910B1 -lan cpp -out ./attention_fusion_grad -m 0 -op AttentionFusionGrad
+rm -rf attention_fusion_grad/op_kernel
+rm -rf attention_fusion_grad/host
+cp -rf op_kernel attention_fusion_grad/
+cp -rf op_host attention_fusion_grad/
+
+cd attention_fusion_grad
+
+# 判断当前目录下是否存在CMakePresets.json文件
+if [ ! -f "CMakePresets.json" ]; then
+  echo "ERROR, CMakePresets.json file not exist."
+  exit 1
+fi
+
+# 禁止生成CRC校验和
+sed -i 's/--nomd5/--nomd5 --nocrc/g' ./cmake/makeself.cmake
+
+# 修改cann安装路径
+sed -i 's:"/usr/local/Ascend/latest":"/usr/local/Ascend/ascend-toolkit/latest":g' CMakePresets.json
+# 修改vendor_name 防止覆盖之前vendor_name为customize的算子;
+# vendor_name需要和aclnn中的CMakeLists.txt中的CUST_PKG_PATH值同步，不同步aclnn会调用失败;
+# vendor_name字段值不能包含customize；包含会导致多算子部署场景CANN的vendors路径下config.ini文件内容截取错误
+sed -i 's:"customize":"attention_fusion_grad":g' CMakePresets.json
+
+bash build.sh
+
+# # 安装编译成功的算子包
+bash ./build_out/custom_opp*.run
+cd ../aclnn_attention_fusion_grad
+bash run.sh
\ No newline at end of file
-- 
Gitee


From 794b3eb889648caddc39fb732e51b9af931387ba Mon Sep 17 00:00:00 2001
From: dev <dev@test.com>
Date: Thu, 25 Jul 2024 01:36:20 +0800
Subject: [PATCH 2/2] =?UTF-8?q?=E3=80=90FEAT=E3=80=91=E6=96=B0=E5=A2=9Eatt?=
 =?UTF-8?q?ention=20grad=E8=9E=8D=E5=90=88=E7=AE=97=E5=AD=90=20PART2?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../aclnn_attention_fusion_grad/inc/op_runner.h              | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/op_runner.h b/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/op_runner.h
index 8005fa64..bd3347ac 100644
--- a/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/op_runner.h
+++ b/cust_op/attention_fusion_grad/aclnn_attention_fusion_grad/inc/op_runner.h
@@ -37,6 +37,11 @@ public:
     */
     bool Init();
 
+    /**
+    * @brief Init op runner output info
+    */
+    bool InitOutputInfo();
+
     /**
      * @brief Get number of inputs
      * @return number of inputs
-- 
Gitee