find . -print | awk -F . '{print $NF}' | sort | uniq -c | sort -n
( awk -F . '{print $NF}' leaves only file extension. )
For example, libssh-0.9.4 project:
...
1 yml
2 md
2 png
2 sh
6 in
8 svg
9 pub
12 dox
14 symbols
14 txt
27 cmake
65 h
174 c
Would work for a current directory only. Like, libssh-0.9.4/tests:
% file -b * | sort | uniq -c | sort -n
1 Python script, ASCII text executable
4 ASCII text
9 directory
19 C source, ASCII text
I'm using the -b option of the file here:
-b, --brief
Do not prepend filenames to output lines (brief mode).
Now for all files in tree:
% find . -type f -exec file -b {} \; | sort | uniq -c | sort -n
1 ASCII text, with no line terminators
1 C++ source, ASCII text
1 C source, ASCII text, with very long lines
1 data
1 empty
1 HTML document, ASCII text
1 HTML document, UTF-8 Unicode text
1 JSON data
1 OpenSSH DSA public key
1 OpenSSH ED25519 public key
1 OpenSSH private key
1 OpenSSH RSA1 private key, version 1.1
1 PEM DSA private key
1 Python script, ASCII text executable
1 ReStructuredText file, UTF-8 Unicode text
1 UTF-8 Unicode text
2 ASCII text, with very long lines
2 OpenSSH ECDSA public key
2 PEM EC private key
2 PNG image data, 25 x 25, 8-bit/color RGBA, non-interlaced
3 OpenSSH RSA public key
3 ReStructuredText file, ASCII text
4 PEM RSA private key
8 SVG Scalable Vector Graphics image
80 ASCII text
245 C source, ASCII text
For libssh-0.9.4 again:
% find . -type f -exec stat {} \;
File: ./src/dh_key.c
Size: 10023 Blocks: 24 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 31855060 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ i) Gid: ( 1000/ i)
Access: 2021-12-07 12:31:34.074679564 +0200
Modify: 2020-04-06 12:36:35.000000000 +0300
Change: 2020-12-18 20:19:06.156283089 +0200
Birth: -
File: ./src/knownhosts.c
Size: 36939 Blocks: 80 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 31855074 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ i) Gid: ( 1000/ i)
Access: 2021-12-07 12:31:34.090679832 +0200
Modify: 2020-01-27 17:45:32.000000000 +0200
Change: 2020-12-18 20:19:06.188283745 +0200
Birth: -
Let's narrow this info to 'Modify' field only:
% find . -type f -exec stat {} \; | grep Modify
Modify: 2020-01-27 17:45:32.000000000 +0200
Modify: 2020-03-27 14:13:36.000000000 +0200
Modify: 2020-01-27 17:45:32.000000000 +0200
Modify: 2020-01-27 17:45:32.000000000 +0200
Modify: 2020-01-27 17:45:32.000000000 +0200
Pull only date from these strings:
% find . -type f -exec stat {} \; | grep Modify | cut -d ' ' -f 2
2020-01-27
2020-01-27
2020-01-27
2020-01-27
2020-04-09
2020-04-09
Collect statistics. What days are most active:
% find . -type f -exec stat {} \; | grep Modify | cut -d ' ' -f 2 | sort | uniq -c | sort -n
1 2018-12-07
2 2019-01-31
2 2020-03-30
3 2009-06-15
3 2013-02-07
7 2020-03-27
8 2016-02-15
8 2018-09-10
9 2020-04-06
22 2018-10-19
68 2020-04-09
234 2020-01-27
Narrow to pure C files using -name or -iname:
% find . -name '*.c' -type f -exec stat {} \; | grep Modify | cut -d ' ' -f 2 | sort | uniq -c | sort -n
1 2009-06-15
1 2013-02-07
1 2018-09-10
1 2018-12-07
4 2018-10-19
5 2016-02-15
6 2020-03-27
8 2020-04-06
34 2020-04-09
113 2020-01-27
(Updated in Dec-2023.)
Also, let's see timezones of maintainers of some popular github project, like LLVM.
% git remote -v
origin https://github.com/llvm/llvm-project.git (fetch)
origin https://github.com/llvm/llvm-project.git (push)
% git log | grep "^Date:" | cut -d ' ' -f 9 | sort | uniq -c | sort -n
2 +0430
9 +0330
10 +1200
36 +1300
43 -1000
129 -0300
129 +1100
137 +1000
163 +0400
213 +0500
1355 +0700
1574 +0530
1672 +0900
1701 -0600
5247 +0300
5534 +0800
9600 -0500
12015 -0400
14380 +0200
19276 -0800
23018 +0100
35887 -0700
352210 +0000
UTC (+0) is to be ignored -- probably some bots. Next popular is -7 (and -8) -- west coast of U.S. +1 and +2 is (western) Europe. -4 and -5 is east coast of U.S. +8 is China. +3 is eastern Europe and Russia. Neat! Majority of LLVM development happens at west coast of U.S.
Now let's determine most active hours in LLVM development.
Dates (only) for all commits:
% git log --date=format:'%Y-%m-%d %H:%M:%S %z' --pretty=format:'%ad' 2023-12-19 11:46:30 -0700 2023-12-19 10:44:18 -0800 2023-12-19 11:39:00 -0700 2023-12-19 19:32:17 +0100 ...
Ouch, these dates are with timezone. But I can manage this by adding '--date=local':
% git log --date=format:'%Y-%m-%d %H:%M:%S %z' --pretty=format:'%ad' --all --date=local Tue Dec 19 19:46:30 2023 Tue Dec 19 19:44:18 2023 Tue Dec 19 19:39:00 2023 Tue Dec 19 19:32:17 2023 Tue Dec 19 19:26:23 2023 ...
I run this on a server with CET time zone (Germany). So all dates converted to CET TZ.
Filter only HH:MM:SS:
% git log --date=format:'%Y-%m-%d %H:%M:%S %z' --pretty=format:'%ad' --all --date=local | cut -d ' ' -f 4 19:46:30 19:44:18 19:39:00 19:32:17 19:26:23 19:11:42 ...
Filter only hour and show statistics:
% git log --date=format:'%Y-%m-%d %H:%M:%S %z' --pretty=format:'%ad' --all --date=local | cut -d ' ' -f 4 | cut -d ':' -f 1 | sort | uniq -c 30984 00 28691 01 24164 02 17517 03 12288 04 11004 05 12132 06 12993 07 13695 08 13022 09 13935 10 14335 11 13449 12 12357 13 15000 14 17468 15 19965 16 22351 17 27073 18 31797 19 33739 20 27708 21 33076 22 33678 23
You see, most active hours are between 1900 (CET) and 0100. Night in Europe and/or day in U.S.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.