社区代码统计的3种方式 原创 精华

zhushangyuan_
发布于 2023-3-26 18:29
浏览
4收藏

社区代码度量方式总结

当开发者每提交一笔Pull Request/Commit时,都会有代码修改量,包含新增了多少行代码、删除了多少行代码。这属于过程动态的统计方式。还可以统计仓库代码量,不关注提交过程,只关注存留的代码量。这属于静态的统计方式。每位开发者都会关注自己贡献的代码量,此文就总结下,各个代码量统计方式。

1、git log统计每笔提交的修改量

先看下git log相关的参数说明。

git log 参数说明

–numstat

Similar to --stat, but shows number of added and deleted lines in decimal notation and pathname without abbreviation, to make it more machine friendly. For binary files, outputs two - instead of saying 0 0.

–shortstat

Output only the last line of the --stat format containing total number of modified files, as well as number of added and deleted lines.

–stat

使用–stat参数主要可以在git log 的基础上输出文件增删改的统计数据。

$ git log --stat
commit fa71c098e2912b69a1c82348d403b3260f2dc64e (HEAD -> temp_temp)
Author: zz********g <z********g@gmail.com>
Date:   Wed Aug 12 17:19:05 2020 +0800
 add txt file and dir           # commit信息
txt/a.txt | 1 +                   # 文件修改状态,添加或删除了多少行
1 file changed, 1 insertion(+)    # 统计变更文件数量

执行后获得输出如下

commit d0411d5e8d26be3abde076e24f026b25cc2e7819 (HEAD -> master, origin/master, origin/HEAD)
Merge: ae99435 faf351b
Author: ******** <d********g@h****I.com>
Date:   Tue Feb 21 12:54:28 2023 +0000

    !1358 add communication_dsoftbus commiter
    Merge pull request !1358 from michael4096/master

commit ae99435b2347d4b648c03f9dcf7d7e095bb150a4
Author: z********o <z********0@h****I.com>
Date:   Tue Feb 21 12:42:59 2023 +0000

    !1360 Add libabigail and elfutils to openharmony-sig
    * Add libabigail and elfutils to openharmony-sig

2       0       sig/sig-basicsoftwareservice/sig-basicsoftwareservice.md
2       0       sig/sig-basicsoftwareservice/sig-basicsoftwareservice_cn.md
6       2       sig/sigs.json
 3 files changed, 10 insertions(+), 2 deletions(-)

开子进程:

process = subprocess.Popen(cmd,
                           stdout=subprocess.PIPE,
                           stderr=subprocess.PIPE,
                           encoding=_encoding, cwd=WORKING_DIR + project_name)

在检出的代码仓目录下,执行 cmd=[‘git’, ‘log’, ‘–shortstat’, ‘–numstat’],处理输出,就可以统计出来仓库每一笔提交的代码增删修改量。

2、统计仓库代码量

使用cloc工具统计仓库代码量,cloc相关的参数如下,其中force-lang-def指定支持的编程语言定义,by-file-by-lang 指定按编程语言按文件分别输出。

--read-lang-def=E:\\WorkSpace\\lmk-bohan\\stat-data\\my_definitions.txt --by-file-by-lang
参数说明
 --force-lang-def=<file>   Load language processing filters from <file>,
                             then use these filters instead of the built-in
                             filters.  Note:  languages which map to the same
                             file extension (for example:
                             MATLAB/Mathematica/Objective-C/MUMPS/Mercury;
                             Pascal/PHP; Lisp/OpenCL; Lisp/Julia; Perl/Prolog)
                             will be ignored as these require additional
                             processing that is not expressed in language
                             definition files.  Use --read-lang-def to define
                             new language filters without replacing built-in
                             filters (see also --write-lang-def,
                             --write-lang-def-incl-dup).
                       
--by-file-by-lang         Report results for every source file encountered
                             in addition to reporting by language.

在代码仓下,开子进程:

process = subprocess.Popen(cmd,stdout=subprocess.PIPE,
                                   stderr=subprocess.PIPE,
                                   encoding='utf-8', cwd=WORKING_DIR, errors='ignore')

执行

cmd = ['stat-data\\cloc-1.92.exe', '--read-lang-def=' + _current_file + '\\stat-data\\my_definitions.txt',
       '--by-file-by-lang', name]

即可以得到代码仓的每个程序的文件的代码量详情,包含blank空行、comment注释行、code代码行等,通常取其加和,都计算为代码量。输出示例内容如下:

     100 files
     200 files
     300 files
     312 text files.
classified 254 files
Duplicate file check 254 files (245 known unique)
Unique:      100 files                                          
Unique:      200 files                                          
     251 unique files.                              
Counting:  100
Counting:  200
     131 files ignored.

github.com/AlDanial/cloc v 1.92  T=0.24 s (1034.9 files/s, 69044.2 lines/s)
-----------------------------------------------------------------------------------------
File   									blank        comment           code
-----------------------------------------------------------------------------------------
community\sig\sigs.json       			 0              0           1068
community\zh\committer.md     			 0              0            442
community\sig\sig_list.toml   			 49              1            283
community\sig\README.md       			 46              0            190
。
。
。
community\sig\sig-linkboy\oh\oh8.md      7              0              4
community\sig\sig-linkboy\sig_linkboy.md 4              0              4
community\sig\sig-linkboy\oh\oh3.md      6              0              3
-----------------------------------------------------------------------------------------
SUM:                                     3684              1          13061
-----------------------------------------------------------------------------------------

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Markdown                       248           3635              0          11685
JSON                             2              0              0           1093
TOML                             1             49              1            283
-------------------------------------------------------------------------------
SUM:                           251           3684              1          13061
-------------------------------------------------------------------------------

3、统计代码仓贡献者的代码量

在上一章节,可以统计代码仓的代码量,有时候还想知道每一个开发者在这个代码仓中贡献了多少代码量,甚至还可以根据邮箱后缀,还知道每一家单位贡献的代码量。

对于上一章节cloc统计的每一个源代码文件,可以对这些文件执行git blame命令:

  • 遍历cloc统计文件列表,执行git blame -e,获取贡献者及其贡献的代码量:
cmd = ['git', 'blame', '-e', _file_path]

示例输出如下,可以看出这个文件,每一行最后是谁贡献的。如果一行文件,属于A新增的,但是B修改后,那最终显示这一行属于B贡献的,最后一次修改者最终贡献了这一行代码。

git blame -e default.xml
7d700a41 (<z**************n@h****I.com> 2022-05-07 21:01:53 +0800 1) <?xml version="1.0" encoding="UTF-8"?>
7d700a41 (<z**************n@h****I.com> 2022-05-07 21:01:53 +0800 2) <manifest>
7d700a41 (<z**************n@h****I.com> 2022-05-07 21:01:53 +0800 3)   <remote fetch="."1 name="origin" review="https://openharmony.gitee.com/openharmony/"/>
7d700a41 (<z**************n@h****I.com> 2022-05-07 21:01:53 +0800 4)   <default remote="origin" revision="master" sync-j="4" />
7d700a41 (<z**************n@h****I.com> 2022-05-07 21:01:53 +0800 5)
7d700a41 (<z**************n@h****I.com> 2022-05-07 21:01:53 +0800 6)   <include name="ohos/ohos.xml" />
7d700a41 (<z**************n@h****I.com> 2022-05-07 21:01:53 +0800 7)   <include name="chipsets/all.xml" />
7d700a41 (<z**************n@h****I.com> 2022-05-07 21:01:53 +0800 8) </manifest>

D:\codes\code-count\manifest>git blame -e devboard.xml
019f8d3f (<m***************1@h****I.com> 2021-04-20 15:11:42 +0800  1) <?xml version="1.0" encoding="UTF-8"?>
019f8d3f (<m***************1@h****I.com> 2021-04-20 15:11:42 +0800  2) <manifest>
9e17c922 (<m***************1@h****I.com> 2021-11-27 06:33:25 +0000  3)     <remote fetch="https://gitee.com/openharmony-sig" name="sig" review="https://gitee.com/openharmony-sig/"/>
019f8d3f (<m***************1@h****I.com> 2021-04-20 15:11:42 +0800  4)     <include name="default.xml" />
019f8d3f (<m***************1@h****I.com> 2021-04-20 15:11:42 +0800  5)     <project name="device_st" path="device/st" revision="master" remote="sig"/>
c3deb066 (<l*************g@h****I.com>    2021-04-28 17:04:50 +0800  6)     <project name="device_allwinner" path="device/allwinner" revision="master" remote="sig"/>
c3deb066 (<l*************g@h****I.com>    2021-04-28 17:04:50 +0800  7)     <project name="vendor_h****I_ipcamera_v3s" path="vendor/h****I/ipcamera_v3s" revision="master" remote="sig"/>
019f8d3f (<m***************1@h****I.com> 2021-04-20 15:11:42 +0800  8)     <project name="vendor_h****I_minidisplay_demo" path="vendor/h****I/minidisplay_demo" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800  9)     <project name="device_mediatek" path="device/mediatek" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 10)     <project name="device_nordic" path="device/nordic" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 11)     <project name="device_nxp" path="device/nxp" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 12)     <project name="device_fudanmicro" path="device/fudanmicro" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 13)     <project name="device_bestechnic" path="device/bestechnic" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 14)     <project name="device_ingenic" path="device/ingenic" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 15)     <project name="device_espressif" path="device/espressif" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 16)     <project name="device_winnermicro" path="device/winnermicro" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 17)     <project name="device_unisoc" path="device/unisoc" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 18)     <project name="device_broadcom" path="device/broadcom" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 19)     <project name="device_realtek" path="device/realtek" revision="master" remote="sig"/>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 20)     <project name="device_bouffalolab" path="device/bouffalolab" revision="master" remote="sig"/>
019f8d3f (<m***************1@h****I.com> 2021-04-20 15:11:42 +0800 21) </manifest>
2bb5688e (<l*************g@h****I.com>    2021-06-07 10:45:13 +0800 22)

小结

本文介绍了代码量统计的各种方式。统计每一笔提交的增删改修改量,偏重统计过程中的贡献,批量增删时,容易产生巨量的代码量。cloc可以看出代码仓的最新最终规模,过程中的批量增删,会对冲。而git blame统计会看重每一位贡献者的代码量。因为时间关系,仓促写作,或能力限制,若有失误之处,请各位读者多多指正。遗漏之处,欢迎补充。感谢阅读,有什么问题,请留言。

©著作权归作者所有,如需转载,请注明出处,否则将追究法律责任
分类
标签
已于2023-5-30 15:26:08修改
4
收藏 4
回复
举报
2条回复
按时间正序
/
按时间倒序
红叶亦知秋
红叶亦知秋

通过统计存留的代码可以比较客观的看自己对开源做出的贡献。


回复
2023-3-27 10:48:06
SummerRic
SummerRic

666

回复
2023-3-28 11:13:15
回复
    相关推荐