SQL引擎之查询优化——costsize.cpp排序成本篇 · Pull Request !249 · openGauss/blog

## 前言
本篇对递归联合成本进行计算并估计输出大小，并分析了对一个关系进行排序的成本，同时该成本包括了输入数据的成本（普遍是不考虑的）。

## cost_sort
**作用：**确定并返回对一个关系进行排序的成本，包括读取输入数据的成本。
**sort_nums**
* 如果需要排序的数据总量小于sort_mem，我们将进行内存排序，这不需要I/O和对t个图元进行约t*log2(t)个图元的比较。
* 如果总容量超过sort_mem，我们就切换到磁带式合并算法。 总共仍然会有大约t*log2(t)个元组的比较，但是我们还需要在每个合并过程中对每个元组进行一次写入和读取。	我们预计会有大约ceil(logM(r))次合并（取≥logM(r)最小整数），其中r是初始运行的数量，M是tuplesort.c使用的合并顺序。

**磁带式合并算法（tape-style merge algorithm）**
 * 		disk traffic(磁盘流量) = 2 * relsize * ceil(logM(p / (2*sort_mem))
 * 		cpu = comparison_cost * t * log2(t)
 * 磁盘流量被假定为3/4的顺序和1/4的随机访问，随机更适合小样本，这点在前篇成本计算就已经提出，假如这个数据是根据大数据分析后得出的效率更高，是合情合理的...

**堆快排方法：**如果排序是有界的（即只需要前k个结果图元），并且k个图元可以放入sort_mem中，可以使用一个堆方法，在堆中只保留k个图元；这将需要大约t*log2(k)图元比较，也既是堆快排时间复杂度。

**额外成本（理应成本）：**默认情况下，系统对每个元组的比较收取两个运算符的evals，这在大多数情况下应该是正确的。调用者可以通过指定非零的comparison_cost来调整这一点；一般来说，这是为任何额外的额外的工作，以便为比较运算符的输入做准备。

## 代码标注
```cpp
/*
 * cost_recursive_union
 * 	确定并返回执行递归联合的成本，以及估计的输出大小。
 *
 * 已经给出了非递归和递归条款的计划。
 *
 * 请注意，参数和输出都是Plans，而不是本模块其他部分的Paths。 这是因为我们不需要为递归联合设置Path表示法---我们只有一种方法可以做到。
 */
 
 void cost_recursive_union(Plan* runion, Plan* nrterm, Plan* rterm)
{
    Cost startup_cost;
    Cost total_cost;
    double total_rows;
    double total_global_rows;

/* 对非递归项估计 */
    startup_cost = nrterm->startup_cost;
    total_cost = nrterm->total_cost;
    total_rows = PLAN_LOCAL_ROWS(nrterm);
    total_global_rows = nrterm->plan_rows;

/*
     * 任意假设需要10次递归迭代，并且已经设法很好地解决了其中每一次的成本和输出大小。 这些假设很不可靠，但很难看出如何做得更好。
     */
    total_cost += 10 * rterm->total_cost;
    total_rows += 10 * PLAN_LOCAL_ROWS(rterm);
    total_global_rows += 10 * rterm->plan_rows;

/*
     *还对每行收取cpu_tuple_cost，以说明操作tuplestores的成本。
     */
    total_cost += u_sess->attr.attr_sql.cpu_tuple_cost * total_rows;

runion->startup_cost = startup_cost;
    runion->total_cost = total_cost;
    set_plan_rows(runion, total_global_rows, nrterm->multiple);
    runion->plan_width = Max(nrterm->plan_width, rterm->plan_width);
}

/*
 * cost_sort
 * 	确定并返回对一个关系进行排序的成本，包括读取输入数据的成本.
 */
 void cost_sort(Path* path, List* pathkeys, Cost input_cost, double tuples, int width, Cost comparison_cost,
    int sort_mem, double limit_tuples, bool col_store, int dop, OpMemInfo* mem_info, bool index_sort)
{
    Cost startup_cost = input_cost;
    Cost run_cost = 0;
    double input_bytes = relation_byte_size(tuples, width, col_store, true, true, index_sort) / SET_DOP(dop);
    double output_bytes;
    double output_tuples;
    long sort_mem_bytes = sort_mem * 1024L / SET_DOP(dop);

dop = SET_DOP(dop);

if (!u_sess->attr.attr_sql.enable_sort)
        startup_cost += g_instance.cost_cxt.disable_cost;

/*
     * 要确保排序的成本永远不会被估计为零，即使传入的元组数量为零。 此外，不能做log(0)...
     */
    if (tuples < 2.0) {
        tuples = 2.0;
    }

/* 包括默认的每次比较的成本 */
    comparison_cost += 2.0 * u_sess->attr.attr_sql.cpu_operator_cost;

if (limit_tuples > 0 && limit_tuples < tuples) {
        output_tuples = limit_tuples;
        output_bytes = relation_byte_size(output_tuples, width, col_store, true, true, index_sort);
    } else {
        output_tuples = tuples;
        output_bytes = input_bytes;
    }

if (output_bytes > sort_mem_bytes) {
        /*
         * CPU成本
         *
         * 假设约有N个对数N的比较
         */
        startup_cost += comparison_cost * tuples * LOG2(tuples);

/* 磁盘成本 */
        startup_cost += compute_sort_disk_cost(input_bytes, sort_mem_bytes);
    } else {
        if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes) {
            /*
             * 使用有界堆排序，在内存中只保留K个图元，图元比较的总数为N log2 K；但常数因素比quicksort要高一些。 对它进行调整，使成本曲线在交叉点上是连续的。
             */
            startup_cost += comparison_cost * tuples * LOG2(2.0 * output_tuples);
        } else {
            /* 对所有的输入图元使用普通的quicksort */
            startup_cost += comparison_cost * tuples * LOG2(tuples);
        }
    }

if (mem_info != NULL) {
        mem_info->opMem = u_sess->opt_cxt.op_work_mem;
        mem_info->maxMem = output_bytes / 1024L * dop;
        mem_info->minMem = mem_info->maxMem / SORT_MAX_DISK_SIZE;
        mem_info->regressCost = compute_sort_disk_cost(input_bytes, mem_info->minMem);
        /* 特殊情况，如果阵列大于1G，所以系统必须溢出到磁盘 */
        if (output_tuples > (MaxAllocSize / TUPLE_OVERHEAD(true) * dop)) {
            mem_info->maxMem = STATEMENT_MIN_MEM * 1024L * dop;
            mem_info->minMem = Min(mem_info->maxMem, mem_info->minMem);
        }
    }

/*
     *还对每个提取的元组收取少量费用（任意设置为等于操作者成本）。 系统不收取cpu_tuple_cost，因为排序节点不做质量检查或投影，所以它的开销比大多数计划节点要少。 注意在这里使用tuples而不是output_tuples是正确的------否则的话，LIMIT的上限会按比例计算运行成本，所以会重复计算LIMIT。
     */
    run_cost += u_sess->attr.attr_sql.cpu_operator_cost * tuples;

path->startup_cost = startup_cost;
    path->total_cost = startup_cost + run_cost;
    path->stream_cost = 0;

if (!u_sess->attr.attr_sql.enable_sort)
        path->total_cost *=
            (g_instance.cost_cxt.disable_cost_enlarge_factor * g_instance.cost_cxt.disable_cost_enlarge_factor);
}
```

Type	Assign personnel	Status
Reviewer	xiangxinyong zhongjun2 ZhengyuhangHans zhangxubo	Completed (0/0 )
Tester	xiangxinyong zhongjun2 ZhengyuhangHans zhangxubo	Completed (0/0 )

openGauss/blog .gitee-modal { width: 500px !important; }

Search

openGauss/blog