[sword-devel] FYI geo IP lookups of repo access

DM Smith dmsmith at crosswire.org
Sun Sep 10 19:20:44 MST 2017


The country information is interesting. I’ve found that bots also skew the counts.

In my bin dir on the CW server, I have a perl program, moduleScrape.pl, (~/bin/moduleScrape.pl) that slogs through the logs to figure out module downloads counting each download once rather than by all the parts. It first goes through the conf files to find the module in the repository and then picks a single file for each module. Then it goes through the log files (ftp and http) looking for downloads (including zip files) of modules. It tosses hits by bots. The output format is normalized to:
Date	Module	Format	Transport		IP	Country	Simplified agent
Note IP is obscured here.
20150628        Easton  prt     FTP     xxx.xxx.xxx.xxx  United States   W4.0.2 at xiphos.org
20150628        PolGdanska      zip     HTTP    xxx.xxx.xxx.xxx    Poland  Apache-HttpClient/UNAVAILABLE (java 1.4)

The program needs tweaking for each server as it “knows” CrossWire’s repositories and it’s logs.

There are a bunch of flags that allow to specify a date range and is geared to find that last full month.

The program started out by J Ansorg and improved by N Carter.

I’ve also a program moduleStats, that runs this program and analyzes the output to produce statistics about the modules.

Troy and I’ve been talking about tossing the data into a database.

DM


> On Sep 10, 2017, at 5:38 PM, Karl Kleinpaste <karl at kleinpaste.org> wrote:
> 
> Now and then I get curious about where all the accesses to ftp.xiphos.org <ftp://ftp.xiphos.org/> come from.  This is a crude summary from my /var/log/xferlog since early August.  Counts of accesses can be gotten by substituting the last "uniq" stage of the pipeline with "uniq -c | sort -nr" but such counts are registering individual files accessed, which is not very informative, especially for modules that include dozens of image files.
> 
> cat xferlog* | cut -f7 -d' ' | sed -e s/::ffff:// | sort | uniq -c | sort -nr | awk '{ print $2 }' | fgrep . | while read ip ; do geoiplookup $ip ; done | grep 'GeoIP Country Edition' | sed -e 's/GeoIP Country Edition: //' | sort | uniq

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20170910/13a2c86b/attachment.html>


More information about the sword-devel mailing list