[Unix] Tracking changes on web1.0-style websites

There are several old-school web1.0-style websites which doesn't have any RSS feed, let alone social networks accounts. For example, Fabrice Bellard, Donald Knuth.

Here I wrote a simple bash script that tracks changes on these websites using simple lynx and storing all the info in git repo.

Download.

It's to be run by cron. Once a day I'm getting reports like:

web_updates_fn.sh: https://www.ioccc.org/ changed
(removed)    The winners of the 27^th IOCCC have been announced. Congratulations!
(added)    The source winners of the 27^th IOCCC have been released.
(added)    Congratulations!
(removed)    Please see the following news items.
(added)    Right now, you cannot submit an entry becasue the IOCCC is not open. We
(added)    do plan to hold IOCCC28 in 2021.
(added)
(added)    Please see the watch the IOCCC news below for information on the next
(added)    IOCCC.

...

web_updates_fn.sh: https://www-cs-faculty.stanford.edu/~knuth/news.html changed
(removed) Happy MMXX to all!
(added) Happy MMXXI to all!
(removed)    I celebrated my 82nd birthday this year by watching a marvelous video
(removed)    of the Czech première of my multimedia composition Fantasia
(removed)    Apocalyptica.
(removed)
(removed) A happy π birthday
(removed)
(removed)    Johan de Ruiter sent me a great puzzle for my birthday this year!
(added)    This is the year when I promised to do the seven-year cleanup of TeX
(added)    and METAFONT (last updated in 2014). I hope to have that completed
(added)    soon.
(removed) Oral histories
(removed)

I also track new papers on Arxiv about SMT solvers:

web_updates_fn.sh: https://arxiv.org/search/?query=smt+solver&searchtype=all&source=header changed
(removed) Showing 1–50 of 281 results for all: smt solver
(added) Showing 1–50 of 282 results for all: smt solver
(removed)     1. arXiv:2012.06370  [pdf, ps, other]
(added)     1. arXiv:2012.14235  [pdf, other]
(added)        cs.FL cs.AI
(added)        FOREST: An Interactive Multi-tree Synthesizer for Regular
(added)        Expressions

My scripts dumps this to stdout, and cron emails it all to me (it emails whatever dumped to stdout to cron job owner). This works on my server.

But you can use it on your standalone Unix box, of course.

git repo tracks history of changes, like:

 % git log -p https___www.ioccc.org_.old

commit f37645293a448ab4f142fd4cbf34933969d967e5
Author: <...>
Date:   Tue Jan 5 01:00:27 2021 +0100

    ...

diff --git a/https___www.ioccc.org_.old b/https___www.ioccc.org_.old
index 621bb1e..fed0260 100644
--- a/https___www.ioccc.org_.old
+++ b/https___www.ioccc.org_.old
@@ -7,9 +7,14 @@
             IOCCC news | People who have won | Winning entries ]
      __________________________________________________________________

-   The winners of the 27^th IOCCC have been announced. Congratulations!
+   The source winners of the 27^th IOCCC have been released.
+   Congratulations!

-   Please see the following news items.
+   Right now, you cannot submit an entry becasue the IOCCC is not open. We
+   do plan to hold IOCCC28 in 2021.
+
+   Please see the watch the IOCCC news below for information on the next
+   IOCCC.
      __________________________________________________________________
git log -p https___www-cs-faculty.stanford.edu_~knuth_news.html.old

commit bf690231ac6489644d0fa7d08368e19997987684
Author: <...>
Date:   Sat Jan 2 01:00:25 2021 +0100

    ...

diff --git a/https___www-cs-faculty.stanford.edu_~knuth_news.html.old b/https___www-cs-faculty.stanford.edu_~knuth_news.html.old
index 75fdf30..36670c3 100644
--- a/https___www-cs-faculty.stanford.edu_~knuth_news.html.old
+++ b/https___www-cs-faculty.stanford.edu_~knuth_news.html.old
@@ -2,81 +2,17 @@

    [click here to zip down to the schedule of public lectures]

-Happy MMXX to all!
+Happy MMXXI to all!

-   I celebrated my 82nd birthday this year by watching a marvelous video
-   of the Czech première of my multimedia composition Fantasia
-   Apocalyptica.
-
-A happy π birthday
-
-   Johan de Ruiter sent me a great puzzle for my birthday this year!
+   This is the year when I promised to do the seven-year cleanup of TeX
+   and METAFONT (last updated in 2014). I hope to have that completed
+   soon.

 However, a sad word ladder

    VIRUS - VIRES - FIRES - FIRER - FIVER - FEVER
    (fortunately my family and I are still healthy)

And of course, it has no problem tracking RSS feeds. Like BBC news RSS feed.


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.