All US listed companies are required to promptly file any material investor announcements with the SEC. And websites can track the SEC edgar website and alert users of any changes. I use secfilings.com. But I have problem with my small foreign holdings or US companies not required to file with the SEC. All such companies that I own are responsible companies which report anything relevant to the public on the investor portion of their website. But I generally cannot get notifications from them. In the past, I just count on going regularly to the website to check, especially around the time of their quarterly announcements.
I've occasionally thought about tackling the problem of how to track website changes. I researched online and tried some of the nicer online tools, but they all charge for the service — for example, visualping.io. That is reasonable since the tool can only know of changes by brute force query of the site at periodic intervals.
But the New Century fiasco spurred me to action. Instead of paying, I decided — as with everything else I do related to investing — to go DIY. To do this method requires a computer that runs Linux or an unix-like system such MAC OS or Android. And if you must use Windows PC you can install cygwin on Windows. The computer must be connected to the internet, and preferably should be always on. I have a Linux home computer connected to the internet 24/7.
This method checks the websites using a simple Perl script. Perl is a basic command line tool offered in virtually all installations. But if Perl isn't installed, you can easily install it manually. The script, when run the first time, will create a base copy of each website that I want to track. Then every hour it checks those websites again and compares the current webpage with the base copy. If the script detects a meaningful difference, it will stop. The next time the I check on the script window I'll know what website changed. To resume, I can first command it to overwrite the old base webpage copy with the new changed webpage file.
The first part of the script is a list of website URLs. For each URL, I also give it a name and keywords to ignore. The ignored keywords prevent the script from excessively flagging minor changes such as the date or the current stock quote. See below.
$url[$i]{url} = "http://www.putprop.co.za/content/1997/1982/sens-announcements";The second part of the script iterates through all the URLs in the database. For each URL, it does downloads a copy of the webpage into temp.html. Next the script filters out any exceptions. Then it compares the temp.html with the previously stored base copy of the webpage. In the above example, the base copy is putprop.html.
$url[$i]{name} = "putprop";
$url[$i]{exception} = "Parsing Time:";
$urls = $url[$i]{url} ;As the code shows, if the two files match the scripts proceeds. But if they differ, the script aborts. Then I'll know next time I check the script that I should check out that website.
$o = $url[$i]{name} ;
$o .= ".html";
$ret = system ("wget -O temp.html $urls "); }
$temp = $url[$i]{exception};
if ($temp ne "") {
$temp = "-v -E \'$temp\' " ;
$temp = " grep $temp temp.html \> xx ";
print ("exception: $temp\n");
system (" $temp ");
system (" mv xx temp.html");
}
print ("======================================\n");
if (compare("temp.html",$o)==0) {
print ("they are equal $o\n");
} else {
print ("they are NOT equal $o $urls\n");
exit(1);
}
print ("======================================\n");
The final part of the script is a loop which wakes up once an hour and repeats the above process. I won't show that portion of the code, but below is a snapshot of how the program looks on my linux-box. Note that it last woke up at 9:24 AM and it has run for 16 iterations without finding any differences. The last URL it looked at belonged to Combined Motor Holdings, a South African company.
If you'd like a copy of the script, please make a request in the comment section.