文章

找出编译耗时瓶颈,C++编译耗时分析利器-ClangBuildAnalyzer

背景

最近C++ 项目仅有几十个文件,编译时间非常长,Macbook M1 pro 处理器多线程编译还还接近2-4分钟,忍了好久忍无可忍,遂决心定位一下原因

随着工程变大,依赖复杂,C++编译耗时增加,为了找出编译耗时瓶颈,开源工具 ClangBuildAnalyzer 可以有效的分析文件,函数,头文件包含展开等耗时,为编译耗时优化提供参考依据

当然除了 ClangBuildAnalyzer 还有不少其他其他工具,个人觉得它优点是使用起来比较简单,跨平台,开源,编译简单,缺点是只支持 Clang

关于 C++ 编译优化美团技术团队的一篇文章有比较深入的探讨,值得深入研究和实践:C++服务编译耗时优化原理及实践

编译

GitHub: https://github.com/aras-p/ClangBuildAnalyzer 下载源码,Cmake编译:

1
2
mkdir build && cd build
cmake .. && make

ClangBuildAnalyzer 执行文件会生成到 build 目录。

准备

为了能让 分析编译耗时,还需要在个人项目Clang编译参数上添加 -ftime-trace, 如果使用的 Xcode,可以在 Xcode工程 -> Build Setting -> Other C++ Flags 中添加 -ftime-trace 选项

image-20241202162604183

使用

开始追踪,终端执行:

1
./ClangBuildAnalyzer --start <artifacts_folder>

<artifacts_folder> 指的是编译中间.o(obj)文件生成目录, ClangBuildAnalyzer 会开始追踪分析这些文件判断编译的耗时和依赖。

执行编译

此时开始编译自己的项目工程

结束追踪,终端执行:

1
./ClangBuildAnalyzer --stop <artifacts_folder> analy_log.log

<artifacts_folder> 和 start时相同, analy_log.log 为追踪日志保存的文件,名字可以自定义

耗时分析:

1
./ClangBuildAnalyzer --analyze analy_log.log

耗时分析报告

有了这个报告,就可以针对性的去做编译优化了,优化完一轮,再跑一轮分析,直到编译耗时可以接受,报告内容大概包括:

  • Parsing总耗时;
  • Codegen和opts总耗时;
  • 文件编译耗时
  • 模板实例化耗时
  • 函数方法耗时
  • 头文件依赖包含耗时
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Analyzing build trace from 'artifacts/FullCapture.bin'...
**** Time summary:
Compilation (7664 times):
  Parsing (frontend):         2118.9 s
  Codegen & opts (backend):   1204.1 s

**** Files that took longest to parse (compiler frontend):
  5084 ms: cycles_scene.build/RelWithDebInfo/volume.o
  4471 ms: extern_ceres.build/RelWithDebInfo/covariance_impl.o
  4225 ms: bf_intern_libmv.build/RelWithDebInfo/resect.o
  4121 ms: bf_blenkernel.build/RelWithDebInfo/volume_to_mesh.o
 
**** Files that took longest to codegen (compiler backend):
 47123 ms: bf_blenkernel.build/RelWithDebInfo/volume.o
 39617 ms: bf_blenkernel.build/RelWithDebInfo/volume_to_mesh.o
 37488 ms: bf_modifiers.build/RelWithDebInfo/MOD_volume_displace.o
 30676 ms: bf_gpu.build/RelWithDebInfo/gpu_shader_create_info.o

**** Templates that took longest to instantiate:
 11172 ms: fmt::detail::vformat_to<char> (142 times, avg 78 ms)
  6662 ms: std::__scalar_hash<std::_PairT, 2>::operator() (3549 times, avg 1 ms)
  6281 ms: std::__murmur2_or_cityhash<unsigned long, 64>::operator() (3549 times, avg 1 ms)
  5757 ms: std::basic_string<char>::basic_string (3597 times, avg 1 ms)
  5541 ms: blender::CPPType::to_static_type_tag<float, blender::VecBase<float, ... (70 times, avg 79 ms)

**** Template sets that took longest to instantiate:
 32421 ms: std::unique_ptr<$> (30461 times, avg 1 ms)
 30098 ms: Eigen::MatrixBase<$> (8639 times, avg 3 ms)
 27524 ms: Eigen::internal::call_assignment_no_alias<$> (2397 times, avg 11 ms)

**** Functions that took longest to compile:
 28359 ms: gpu_shader_create_info_init (source/blender/gpu/intern/gpu_shader_create_info.cc)
  4090 ms: ccl::GetConstantValues(ccl::KernelData const*) (intern/cycles/device/metal/kernel.mm)
  3996 ms: gpu_shader_dependency_init (source/blender/gpu/intern/gpu_shader_dependency.cc)

**** Function sets that took longest to compile / optimize:
 10606 ms: bool openvdb::v10_0::tree::NodeList<$>::initNodeChildren<$>(openvdb:... (470 times, avg 22 ms)
  9640 ms: void tbb::interface9::internal::dynamic_grainsize_mode<$>::work_bala... (919 times, avg 10 ms)
  9459 ms: void tbb::interface9::internal::dynamic_grainsize_mode<$>::work_bala... (715 times, avg 13 ms)
  7279 ms: blender::Vector<$>::realloc_to_at_least(long long) (1840 times, avg 3 ms)
 
**** Expensive headers:
261580 ms: /Developer/SDKs/MacOSX13.1.sdk/usr/include/c++/v1/algorithm (included 3389 times, avg 77 ms), included via:
  341x: BKE_context.h BLI_string_ref.hh string 
  180x: DNA_mesh_types.h BLI_math_vector_types.hh array 
  125x: DNA_space_types.h DNA_node_types.h DNA_node_tree_interface_types.h BLI_function_ref.hh BLI_memory_utils.hh 
  ...

188777 ms: /Developer/SDKs/MacOSX13.1.sdk/usr/include/c++/v1/string (included 3447 times, avg 54 ms), included via:
  353x: BKE_context.h BLI_string_ref.hh 
  184x: DNA_mesh_types.h BLI_offset_indices.hh BLI_index_mask.hh BLI_linear_allocator.hh BLI_string_ref.hh 
  131x: DNA_node_types.h DNA_node_tree_interface_types.h BLI_span.hh 
  ...

174792 ms: source/blender/makesdna/DNA_node_types.h (included 1653 times, avg 105 ms), included via:
  316x: ED_screen.hh DNA_space_types.h 
  181x: DNA_space_types.h 
  173x: <direct include>
  ...

参考

本文由作者按照 CC BY 4.0 进行授权