找出编译耗时瓶颈,C++编译耗时分析利器-ClangBuildAnalyzer
随着工程变大,依赖复杂,C++编译耗时增加,为了找出编译耗时瓶颈,开源工具 ClangBuildAnalyzer 可以有效的分析文件,函数,头文件包含展开等耗时,为编译耗时优化提供参考依据
1. 背景
最近C++ 项目仅有几十个文件,编译时间非常长,Macbook M1 pro 处理器多线程编译还还接近2-4分钟,忍了好久忍无可忍,遂决心定位一下原因
随着工程变大,依赖复杂,C++编译耗时增加,为了找出编译耗时瓶颈,开源工具 ClangBuildAnalyzer
可以有效的分析文件,函数,头文件包含展开等耗时,为编译耗时优化提供参考依据
当然除了 ClangBuildAnalyzer
还有不少其他其他工具,个人觉得它优点是使用起来比较简单,跨平台,开源,编译简单,缺点是只支持 Clang
关于 C++ 编译优化美团技术团队的一篇文章有比较深入的探讨,值得深入研究和实践:C++服务编译耗时优化原理及实践
2. 编译
GitHub: https://github.com/aras-p/ClangBuildAnalyzer 下载源码,Cmake编译:
1
2
mkdir build && cd build
cmake .. && make
ClangBuildAnalyzer
执行文件会生成到 build 目录。
3. 准备
为了能让 分析编译耗时,还需要在个人项目Clang
编译参数上添加 -ftime-trace
, 如果使用的 Xcode
,可以在 Xcode工程 -> Build Setting -> Other C++ Flags 中添加 -ftime-trace
选项
4. 使用
开始追踪,终端执行:
1
./ClangBuildAnalyzer --start <artifacts_folder>
<artifacts_folder>
指的是编译中间.o
(obj)文件生成目录, ClangBuildAnalyzer
会开始追踪分析这些文件判断编译的耗时和依赖。
执行编译:
此时开始编译自己的项目工程
结束追踪,终端执行:
1
./ClangBuildAnalyzer --stop <artifacts_folder> analy_log.log
<artifacts_folder>
和 start时相同, analy_log.log
为追踪日志保存的文件,名字可以自定义
耗时分析:
1
./ClangBuildAnalyzer --analyze analy_log.log
耗时分析报告:
有了这个报告,就可以针对性的去做编译优化了,优化完一轮,再跑一轮分析,直到编译耗时可以接受,报告内容大概包括:
- Parsing总耗时;
- Codegen和opts总耗时;
- 文件编译耗时
- 模板实例化耗时
- 函数方法耗时
- 头文件依赖包含耗时
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Analyzing build trace from 'artifacts/FullCapture.bin'...
**** Time summary:
Compilation (7664 times):
Parsing (frontend): 2118.9 s
Codegen & opts (backend): 1204.1 s
**** Files that took longest to parse (compiler frontend):
5084 ms: cycles_scene.build/RelWithDebInfo/volume.o
4471 ms: extern_ceres.build/RelWithDebInfo/covariance_impl.o
4225 ms: bf_intern_libmv.build/RelWithDebInfo/resect.o
4121 ms: bf_blenkernel.build/RelWithDebInfo/volume_to_mesh.o
**** Files that took longest to codegen (compiler backend):
47123 ms: bf_blenkernel.build/RelWithDebInfo/volume.o
39617 ms: bf_blenkernel.build/RelWithDebInfo/volume_to_mesh.o
37488 ms: bf_modifiers.build/RelWithDebInfo/MOD_volume_displace.o
30676 ms: bf_gpu.build/RelWithDebInfo/gpu_shader_create_info.o
**** Templates that took longest to instantiate:
11172 ms: fmt::detail::vformat_to<char> (142 times, avg 78 ms)
6662 ms: std::__scalar_hash<std::_PairT, 2>::operator() (3549 times, avg 1 ms)
6281 ms: std::__murmur2_or_cityhash<unsigned long, 64>::operator() (3549 times, avg 1 ms)
5757 ms: std::basic_string<char>::basic_string (3597 times, avg 1 ms)
5541 ms: blender::CPPType::to_static_type_tag<float, blender::VecBase<float, ... (70 times, avg 79 ms)
**** Template sets that took longest to instantiate:
32421 ms: std::unique_ptr<$> (30461 times, avg 1 ms)
30098 ms: Eigen::MatrixBase<$> (8639 times, avg 3 ms)
27524 ms: Eigen::internal::call_assignment_no_alias<$> (2397 times, avg 11 ms)
**** Functions that took longest to compile:
28359 ms: gpu_shader_create_info_init (source/blender/gpu/intern/gpu_shader_create_info.cc)
4090 ms: ccl::GetConstantValues(ccl::KernelData const*) (intern/cycles/device/metal/kernel.mm)
3996 ms: gpu_shader_dependency_init (source/blender/gpu/intern/gpu_shader_dependency.cc)
**** Function sets that took longest to compile / optimize:
10606 ms: bool openvdb::v10_0::tree::NodeList<$>::initNodeChildren<$>(openvdb:... (470 times, avg 22 ms)
9640 ms: void tbb::interface9::internal::dynamic_grainsize_mode<$>::work_bala... (919 times, avg 10 ms)
9459 ms: void tbb::interface9::internal::dynamic_grainsize_mode<$>::work_bala... (715 times, avg 13 ms)
7279 ms: blender::Vector<$>::realloc_to_at_least(long long) (1840 times, avg 3 ms)
**** Expensive headers:
261580 ms: /Developer/SDKs/MacOSX13.1.sdk/usr/include/c++/v1/algorithm (included 3389 times, avg 77 ms), included via:
341x: BKE_context.h BLI_string_ref.hh string
180x: DNA_mesh_types.h BLI_math_vector_types.hh array
125x: DNA_space_types.h DNA_node_types.h DNA_node_tree_interface_types.h BLI_function_ref.hh BLI_memory_utils.hh
...
188777 ms: /Developer/SDKs/MacOSX13.1.sdk/usr/include/c++/v1/string (included 3447 times, avg 54 ms), included via:
353x: BKE_context.h BLI_string_ref.hh
184x: DNA_mesh_types.h BLI_offset_indices.hh BLI_index_mask.hh BLI_linear_allocator.hh BLI_string_ref.hh
131x: DNA_node_types.h DNA_node_tree_interface_types.h BLI_span.hh
...
174792 ms: source/blender/makesdna/DNA_node_types.h (included 1653 times, avg 105 ms), included via:
316x: ED_screen.hh DNA_space_types.h
181x: DNA_space_types.h
173x: <direct include>
...