Sign in
Sign up
Explore
Enterprise
Education
Search
Help
Terms of use
About Us
Explore
Enterprise
Education
Gitee Premium
Gitee AI
AI teammates
Sign in
Sign up
Fetch the repository succeeded.
Donate
Please sign in before you donate.
Cancel
Sign in
Scan WeChat QR to Pay
Cancel
Complete
Prompt
Switch to Alipay.
OK
Cancel
Watch
Unwatch
Watching
Releases Only
Ignoring
29
Star
18
Fork
235
openGauss
/
blog
Code
Issues
9
Pull Requests
23
Wiki
Insights
Pipelines
Service
JavaDoc
PHPDoc
Quality Analysis
Jenkins for Gitee
Tencent CloudBase
Tencent Cloud Serverless
悬镜安全
Aliyun SAE
Codeblitz
SBOM
Don’t show this again
249
SQL引擎之查询优化——costsize.cpp排序成本篇
Closed
Remit:master
openGauss:master
Remit
create on 2022-01-19 11:04
Clone/Download
HTTPS
SSH
Copy
Email Patch
Diff file
## 前言 本篇对递归联合成本进行计算并估计输出大小,并分析了对一个关系进行排序的成本,同时该成本包括了输入数据的成本(普遍是不考虑的)。 ## cost_sort **作用:**确定并返回对一个关系进行排序的成本,包括读取输入数据的成本。 **sort_nums** * 如果需要排序的数据总量小于sort_mem,我们将进行内存排序,这不需要I/O和对t个图元进行约t*log2(t)个图元的比较。 * 如果总容量超过sort_mem,我们就切换到磁带式合并算法。 总共仍然会有大约t*log2(t)个元组的比较,但是我们还需要在每个合并过程中对每个元组进行一次写入和读取。 我们预计会有大约ceil(logM(r))次合并(取≥logM(r)最小整数),其中r是初始运行的数量,M是tuplesort.c使用的合并顺序。 **磁带式合并算法(tape-style merge algorithm)** * disk traffic(磁盘流量) = 2 * relsize * ceil(logM(p / (2*sort_mem)) * cpu = comparison_cost * t * log2(t) * 磁盘流量被假定为3/4的顺序和1/4的随机访问,随机更适合小样本,这点在前篇成本计算就已经提出,假如这个数据是根据大数据分析后得出的效率更高,是合情合理的... **堆快排方法:**如果排序是有界的(即只需要前k个结果图元),并且k个图元可以放入sort_mem中,可以使用一个堆方法,在堆中只保留k个图元;这将需要大约t*log2(k)图元比较,也既是堆快排时间复杂度。 **额外成本(理应成本):**默认情况下,系统对每个元组的比较收取两个运算符的evals,这在大多数情况下应该是正确的。调用者可以通过指定非零的comparison_cost来调整这一点;一般来说,这是为任何额外的额外的工作,以便为比较运算符的输入做准备。 ## 代码标注 ```cpp /* * cost_recursive_union * 确定并返回执行递归联合的成本,以及估计的输出大小。 * * 已经给出了非递归和递归条款的计划。 * * 请注意,参数和输出都是Plans,而不是本模块其他部分的Paths。 这是因为我们不需要为递归联合设置Path表示法---我们只有一种方法可以做到。 */ void cost_recursive_union(Plan* runion, Plan* nrterm, Plan* rterm) { Cost startup_cost; Cost total_cost; double total_rows; double total_global_rows; /* 对非递归项估计 */ startup_cost = nrterm->startup_cost; total_cost = nrterm->total_cost; total_rows = PLAN_LOCAL_ROWS(nrterm); total_global_rows = nrterm->plan_rows; /* * 任意假设需要10次递归迭代,并且已经设法很好地解决了其中每一次的成本和输出大小。 这些假设很不可靠,但很难看出如何做得更好。 */ total_cost += 10 * rterm->total_cost; total_rows += 10 * PLAN_LOCAL_ROWS(rterm); total_global_rows += 10 * rterm->plan_rows; /* *还对每行收取cpu_tuple_cost,以说明操作tuplestores的成本。 */ total_cost += u_sess->attr.attr_sql.cpu_tuple_cost * total_rows; runion->startup_cost = startup_cost; runion->total_cost = total_cost; set_plan_rows(runion, total_global_rows, nrterm->multiple); runion->plan_width = Max(nrterm->plan_width, rterm->plan_width); } /* * cost_sort * 确定并返回对一个关系进行排序的成本,包括读取输入数据的成本. */ void cost_sort(Path* path, List* pathkeys, Cost input_cost, double tuples, int width, Cost comparison_cost, int sort_mem, double limit_tuples, bool col_store, int dop, OpMemInfo* mem_info, bool index_sort) { Cost startup_cost = input_cost; Cost run_cost = 0; double input_bytes = relation_byte_size(tuples, width, col_store, true, true, index_sort) / SET_DOP(dop); double output_bytes; double output_tuples; long sort_mem_bytes = sort_mem * 1024L / SET_DOP(dop); dop = SET_DOP(dop); if (!u_sess->attr.attr_sql.enable_sort) startup_cost += g_instance.cost_cxt.disable_cost; /* * 要确保排序的成本永远不会被估计为零,即使传入的元组数量为零。 此外,不能做log(0)... */ if (tuples < 2.0) { tuples = 2.0; } /* 包括默认的每次比较的成本 */ comparison_cost += 2.0 * u_sess->attr.attr_sql.cpu_operator_cost; if (limit_tuples > 0 && limit_tuples < tuples) { output_tuples = limit_tuples; output_bytes = relation_byte_size(output_tuples, width, col_store, true, true, index_sort); } else { output_tuples = tuples; output_bytes = input_bytes; } if (output_bytes > sort_mem_bytes) { /* * CPU成本 * * 假设约有N个对数N的比较 */ startup_cost += comparison_cost * tuples * LOG2(tuples); /* 磁盘成本 */ startup_cost += compute_sort_disk_cost(input_bytes, sort_mem_bytes); } else { if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes) { /* * 使用有界堆排序,在内存中只保留K个图元,图元比较的总数为N log2 K;但常数因素比quicksort要高一些。 对它进行调整,使成本曲线在交叉点上是连续的。 */ startup_cost += comparison_cost * tuples * LOG2(2.0 * output_tuples); } else { /* 对所有的输入图元使用普通的quicksort */ startup_cost += comparison_cost * tuples * LOG2(tuples); } } if (mem_info != NULL) { mem_info->opMem = u_sess->opt_cxt.op_work_mem; mem_info->maxMem = output_bytes / 1024L * dop; mem_info->minMem = mem_info->maxMem / SORT_MAX_DISK_SIZE; mem_info->regressCost = compute_sort_disk_cost(input_bytes, mem_info->minMem); /* 特殊情况,如果阵列大于1G,所以系统必须溢出到磁盘 */ if (output_tuples > (MaxAllocSize / TUPLE_OVERHEAD(true) * dop)) { mem_info->maxMem = STATEMENT_MIN_MEM * 1024L * dop; mem_info->minMem = Min(mem_info->maxMem, mem_info->minMem); } } /* *还对每个提取的元组收取少量费用(任意设置为等于操作者成本)。 系统不收取cpu_tuple_cost,因为排序节点不做质量检查或投影,所以它的开销比大多数计划节点要少。 注意在这里使用tuples而不是output_tuples是正确的------否则的话,LIMIT的上限会按比例计算运行成本,所以会重复计算LIMIT。 */ run_cost += u_sess->attr.attr_sql.cpu_operator_cost * tuples; path->startup_cost = startup_cost; path->total_cost = startup_cost + run_cost; path->stream_cost = 0; if (!u_sess->attr.attr_sql.enable_sort) path->total_cost *= (g_instance.cost_cxt.disable_cost_enlarge_factor * g_instance.cost_cxt.disable_cost_enlarge_factor); } ```
This pull request needs to pass some approval items
Type
Assign personnel
Status
Reviewer
xiangxinyong
zhongjun2
ZhengyuhangHans
zhangxubo
Completed
(0/0 )
Tester
xiangxinyong
zhongjun2
ZhengyuhangHans
zhangxubo
Completed
(0/0 )
How to merge Pull Request
git checkout master
git pull https://gitee.com/remit/blog.git master
git push origin master
Comments
10
Commits
3
Files
15
Checks
Code problems
0
Bulk operation
Expand
Collapse
Reviewer
Code Owner
Reviewer
tongdabao
tongdabao
guohuan
spaceoddity91719
zhang cuiping
zcp100_zcp100
Kamus
kamusis
liuxu-enmo
mogliu
jiangxiaoying
jiangxiaoying1
sky-stars
sky-stars
cchen676
struggle_hw
刘贵宾
vipl
胡正超
gentle_hu
周聪
congzhou2603
舛扬
zijianli16
lestertt
lestertt
mentoswang
mentoswang
胡君
hujunjune
吴冬儿
wu-donger
樊雅清
fyqlpl
Freyaqqianjin
freyaqqianjin
kangyang
ylfan96
liyang
liyang0608
zhengxue
shirley_zhengx
李轶楠
ora-600
邦邦邦邦
gzbang
xiangxinyong
xiangxinyong
zhongjun2
zhongjun2
ZhengyuhangHans
zhengyuhanghans
zhangxubo
zhang_xubo
No Setting
Min number
0
Tester
tongdabao
tongdabao
guohuan
spaceoddity91719
zhang cuiping
zcp100_zcp100
Kamus
kamusis
liuxu-enmo
mogliu
jiangxiaoying
jiangxiaoying1
sky-stars
sky-stars
cchen676
struggle_hw
刘贵宾
vipl
胡正超
gentle_hu
周聪
congzhou2603
舛扬
zijianli16
lestertt
lestertt
mentoswang
mentoswang
胡君
hujunjune
吴冬儿
wu-donger
樊雅清
fyqlpl
Freyaqqianjin
freyaqqianjin
kangyang
ylfan96
liyang
liyang0608
zhengxue
shirley_zhengx
李轶楠
ora-600
邦邦邦邦
gzbang
xiangxinyong
xiangxinyong
zhongjun2
zhongjun2
ZhengyuhangHans
zhengyuhanghans
zhangxubo
zhang_xubo
No Setting
Min number
0
Priority
Not specified
Serious
Main
Secondary
Unimportant
Label
opengauss-cla/no
ci-pipeline-success
Link Issue
No link issue
Successfully merged pull requests will close issues
Milestone
No related milestones
Participators
(4)
1
https://gitee.com/opengauss/blog.git
[email protected]
:opengauss/blog.git
opengauss
blog
blog
Going to Help Center
Search
Git 命令在线学习
如何在 Gitee 导入 GitHub 仓库
Git 仓库基础操作
企业版和社区版功能对比
SSH 公钥设置
如何处理代码冲突
仓库体积过大,如何减小?
如何找回被删除的仓库数据
Gitee 产品配额说明
GitHub仓库快速导入Gitee及同步更新
什么是 Release(发行版)
将 PHP 项目自动发布到 packagist.org
Repository Report
Back to the top
Login prompt
This operation requires login to the code cloud account. Please log in before operating.
Go to login
No account. Register