[Unix] Using the 'uniq -c' command to get some statistics, part II

Previously: 1, 2.

File extension statistics

find . -print | awk -F . '{print $NF}' | sort  | uniq -c | sort -n

( awk -F . '{print $NF}' leaves only file extension. )

For example, libssh-0.9.4 project:

...
      1 yml
      2 md
      2 png
      2 sh
      6 in
      8 svg
      9 pub
     12 dox
     14 symbols
     14 txt
     27 cmake
     65 h
    174 c

File type statistics

Would work for a current directory only. Like, libssh-0.9.4/tests:

 % file -b * | sort | uniq -c | sort -n
      1 Python script, ASCII text executable
      4 ASCII text
      9 directory
     19 C source, ASCII text

I'm using the -b option of the file here:

     -b, --brief
             Do not prepend filenames to output lines (brief mode).

Now for all files in tree:

 % find . -type f -exec file -b {} \; | sort | uniq -c | sort -n

      1 ASCII text, with no line terminators
      1 C++ source, ASCII text
      1 C source, ASCII text, with very long lines
      1 data
      1 empty
      1 HTML document, ASCII text
      1 HTML document, UTF-8 Unicode text
      1 JSON data
      1 OpenSSH DSA public key
      1 OpenSSH ED25519 public key
      1 OpenSSH private key
      1 OpenSSH RSA1 private key, version 1.1
      1 PEM DSA private key
      1 Python script, ASCII text executable
      1 ReStructuredText file, UTF-8 Unicode text
      1 UTF-8 Unicode text
      2 ASCII text, with very long lines
      2 OpenSSH ECDSA public key
      2 PEM EC private key
      2 PNG image data, 25 x 25, 8-bit/color RGBA, non-interlaced
      3 OpenSSH RSA public key
      3 ReStructuredText file, ASCII text
      4 PEM RSA private key
      8 SVG Scalable Vector Graphics image
     80 ASCII text
    245 C source, ASCII text

Most active days of project's development

For libssh-0.9.4 again:

 % find . -type f -exec stat {} \;

  File: ./src/dh_key.c
  Size: 10023           Blocks: 24         IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 31855060    Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/       i)   Gid: ( 1000/       i)
Access: 2021-12-07 12:31:34.074679564 +0200
Modify: 2020-04-06 12:36:35.000000000 +0300
Change: 2020-12-18 20:19:06.156283089 +0200
 Birth: -
  File: ./src/knownhosts.c
  Size: 36939           Blocks: 80         IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 31855074    Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/       i)   Gid: ( 1000/       i)
Access: 2021-12-07 12:31:34.090679832 +0200
Modify: 2020-01-27 17:45:32.000000000 +0200
Change: 2020-12-18 20:19:06.188283745 +0200
 Birth: -

Let's narrow this info to 'Modify' field only:

 % find . -type f -exec stat {} \; | grep Modify

Modify: 2020-01-27 17:45:32.000000000 +0200
Modify: 2020-03-27 14:13:36.000000000 +0200
Modify: 2020-01-27 17:45:32.000000000 +0200
Modify: 2020-01-27 17:45:32.000000000 +0200
Modify: 2020-01-27 17:45:32.000000000 +0200

Pull only date from these strings:

 % find . -type f -exec stat {} \; | grep Modify | cut -d ' ' -f 2

2020-01-27
2020-01-27
2020-01-27
2020-01-27
2020-04-09
2020-04-09

Collect statistics. What days are most active:

 % find . -type f -exec stat {} \; | grep Modify | cut -d ' ' -f 2 | sort | uniq -c | sort -n
      1 2018-12-07
      2 2019-01-31
      2 2020-03-30
      3 2009-06-15
      3 2013-02-07
      7 2020-03-27
      8 2016-02-15
      8 2018-09-10
      9 2020-04-06
     22 2018-10-19
     68 2020-04-09
    234 2020-01-27

Narrow to pure C files using -name or -iname:

 % find . -name '*.c' -type f -exec stat {} \; | grep Modify | cut -d ' ' -f 2 | sort | uniq -c | sort -n
      1 2009-06-15
      1 2013-02-07
      1 2018-09-10
      1 2018-12-07
      4 2018-10-19
      5 2016-02-15
      6 2020-03-27
      8 2020-04-06
     34 2020-04-09
    113 2020-01-27

As seen at reddit.


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.