find . -print | awk -F . '{print $NF}' | sort | uniq -c | sort -n
( awk -F . '{print $NF}' leaves only file extension. )
For example, libssh-0.9.4 project:
... 1 yml 2 md 2 png 2 sh 6 in 8 svg 9 pub 12 dox 14 symbols 14 txt 27 cmake 65 h 174 c
Would work for a current directory only. Like, libssh-0.9.4/tests:
% file -b * | sort | uniq -c | sort -n 1 Python script, ASCII text executable 4 ASCII text 9 directory 19 C source, ASCII text
I'm using the -b option of the file here:
-b, --brief Do not prepend filenames to output lines (brief mode).
Now for all files in tree:
% find . -type f -exec file -b {} \; | sort | uniq -c | sort -n 1 ASCII text, with no line terminators 1 C++ source, ASCII text 1 C source, ASCII text, with very long lines 1 data 1 empty 1 HTML document, ASCII text 1 HTML document, UTF-8 Unicode text 1 JSON data 1 OpenSSH DSA public key 1 OpenSSH ED25519 public key 1 OpenSSH private key 1 OpenSSH RSA1 private key, version 1.1 1 PEM DSA private key 1 Python script, ASCII text executable 1 ReStructuredText file, UTF-8 Unicode text 1 UTF-8 Unicode text 2 ASCII text, with very long lines 2 OpenSSH ECDSA public key 2 PEM EC private key 2 PNG image data, 25 x 25, 8-bit/color RGBA, non-interlaced 3 OpenSSH RSA public key 3 ReStructuredText file, ASCII text 4 PEM RSA private key 8 SVG Scalable Vector Graphics image 80 ASCII text 245 C source, ASCII text
For libssh-0.9.4 again:
% find . -type f -exec stat {} \; File: ./src/dh_key.c Size: 10023 Blocks: 24 IO Block: 4096 regular file Device: fd00h/64768d Inode: 31855060 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 1000/ i) Gid: ( 1000/ i) Access: 2021-12-07 12:31:34.074679564 +0200 Modify: 2020-04-06 12:36:35.000000000 +0300 Change: 2020-12-18 20:19:06.156283089 +0200 Birth: - File: ./src/knownhosts.c Size: 36939 Blocks: 80 IO Block: 4096 regular file Device: fd00h/64768d Inode: 31855074 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 1000/ i) Gid: ( 1000/ i) Access: 2021-12-07 12:31:34.090679832 +0200 Modify: 2020-01-27 17:45:32.000000000 +0200 Change: 2020-12-18 20:19:06.188283745 +0200 Birth: -
Let's narrow this info to 'Modify' field only:
% find . -type f -exec stat {} \; | grep Modify Modify: 2020-01-27 17:45:32.000000000 +0200 Modify: 2020-03-27 14:13:36.000000000 +0200 Modify: 2020-01-27 17:45:32.000000000 +0200 Modify: 2020-01-27 17:45:32.000000000 +0200 Modify: 2020-01-27 17:45:32.000000000 +0200
Pull only date from these strings:
% find . -type f -exec stat {} \; | grep Modify | cut -d ' ' -f 2 2020-01-27 2020-01-27 2020-01-27 2020-01-27 2020-04-09 2020-04-09
Collect statistics. What days are most active:
% find . -type f -exec stat {} \; | grep Modify | cut -d ' ' -f 2 | sort | uniq -c | sort -n 1 2018-12-07 2 2019-01-31 2 2020-03-30 3 2009-06-15 3 2013-02-07 7 2020-03-27 8 2016-02-15 8 2018-09-10 9 2020-04-06 22 2018-10-19 68 2020-04-09 234 2020-01-27
Narrow to pure C files using -name or -iname:
% find . -name '*.c' -type f -exec stat {} \; | grep Modify | cut -d ' ' -f 2 | sort | uniq -c | sort -n 1 2009-06-15 1 2013-02-07 1 2018-09-10 1 2018-12-07 4 2018-10-19 5 2016-02-15 6 2020-03-27 8 2020-04-06 34 2020-04-09 113 2020-01-27
(Updated in Dec-2023.)
Also, let's see timezones of maintainers of some popular github project, like LLVM.
% git remote -v origin https://github.com/llvm/llvm-project.git (fetch) origin https://github.com/llvm/llvm-project.git (push) % git log | grep "^Date:" | cut -d ' ' -f 9 | sort | uniq -c | sort -n 2 +0430 9 +0330 10 +1200 36 +1300 43 -1000 129 -0300 129 +1100 137 +1000 163 +0400 213 +0500 1355 +0700 1574 +0530 1672 +0900 1701 -0600 5247 +0300 5534 +0800 9600 -0500 12015 -0400 14380 +0200 19276 -0800 23018 +0100 35887 -0700 352210 +0000
UTC (+0) is to be ignored -- probably some bots. Next popular is -7 (and -8) -- west coast of U.S. +1 and +2 is (western) Europe. -4 and -5 is east coast of U.S. +8 is China. +3 is eastern Europe and Russia. Neat! Majority of LLVM development happens at west coast of U.S.
Now let's determine most active hours in LLVM development.
Dates (only) for all commits:
% git log --date=format:'%Y-%m-%d %H:%M:%S %z' --pretty=format:'%ad' 2023-12-19 11:46:30 -0700 2023-12-19 10:44:18 -0800 2023-12-19 11:39:00 -0700 2023-12-19 19:32:17 +0100 ...
Ouch, these dates are with timezone. But I can manage this by adding '--date=local':
% git log --date=format:'%Y-%m-%d %H:%M:%S %z' --pretty=format:'%ad' --all --date=local Tue Dec 19 19:46:30 2023 Tue Dec 19 19:44:18 2023 Tue Dec 19 19:39:00 2023 Tue Dec 19 19:32:17 2023 Tue Dec 19 19:26:23 2023 ...
I run this on a server with CET time zone (Germany). So all dates converted to CET TZ.
Filter only HH:MM:SS:
% git log --date=format:'%Y-%m-%d %H:%M:%S %z' --pretty=format:'%ad' --all --date=local | cut -d ' ' -f 4 19:46:30 19:44:18 19:39:00 19:32:17 19:26:23 19:11:42 ...
Filter only hour and show statistics:
% git log --date=format:'%Y-%m-%d %H:%M:%S %z' --pretty=format:'%ad' --all --date=local | cut -d ' ' -f 4 | cut -d ':' -f 1 | sort | uniq -c 30984 00 28691 01 24164 02 17517 03 12288 04 11004 05 12132 06 12993 07 13695 08 13022 09 13935 10 14335 11 13449 12 12357 13 15000 14 17468 15 19965 16 22351 17 27073 18 31797 19 33739 20 27708 21 33076 22 33678 23
You see, most active hours are between 1900 (CET) and 0100. Night in Europe and/or day in U.S.
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.