[sword-devel] module statistics

Greg Hellings greg.hellings at gmail.com
Fri Aug 29 12:47:06 MST 2008


I've gotten that update finished.  The file
~ghellings/makeDownloadsStats.pl has the update.  I dumped a diff of
it to ~ghellings/makeDownloadsStats.diff.  The numbers it comes up
with are surprisingly higher than the current Top20 list on the site
(KJV: 7841, Total: 213281), but if you haven't had updates since
March, I suppose it's not THAT staggering.  Provided the internal
structure of the logfiles hasn't changed, the log reading should still
be accurate, since I didn't touch that - only the filename processing.
 Still, if someone else wants to take a look at those edits and check
to see if it looks like it's running properly.  When I ran it, it
appeared to only spend time parsing the files applicable to the last
30 days.

I didn't remove any of the older code, I simply commented it out, so
if you want to maintain the cleanliness of the version you have in
version control, you might want to take out the commented lines before
committing the changes.


On Fri, Aug 29, 2008 at 1:13 PM,  <greg.hellings at gmail.com> wrote:
> Troy,
> From the looks of that file, editing it to process the new log file
> naming scheme is almost as simple as pulling the directory listing
> rather than iterating over the file names with an integer counter.
> I'll finish the edit this afternoon when I next access a computer.
> Greg
> On 8/24/08, Troy A. Griffitts <scribe at crosswire.org> wrote:
>> Dear Greg,
>> Thank you so much for your work.  Both you and DM had offered to help on
>> this.  As DM has a ton of other tasks, I'm sure he would appreciate it
>> if you wanted to own this.  Here is the history up to now.
>> Originally, I believe Joachim, Chris, Martin, and DM had a hand in
>> creating, improving, debugging, etc., a perl script to do module
>> statistics.  I think they worked out a good way to minimize skewed
>> numbers from multiple retries, multiple files per modules, etc.  I've
>> moved their script to ~sword/bin/ on the server and placed it under
>> version control.
>> If you'd like to own this task moving forward, you are more than
>> welcome-- and I think I can say this for all those involved in the
>> process in the past (though they can speak up if they still have a
>> heartfelt attachment to the task).  However, so as not to neglect
>> gleaning from their past work, I would like to ask you to take a look at
>> their script and see how they decided to computer numbers.
>> This script is run from a daily cron job to produce the top20.html file
>> on swords front page.  The arguments for the run are:
>> /home/sword/bin/makeDownloadsStats.pl /home/sword/html/top20.html 20 30
>> If your new python script could take the same params and generate a
>> similar file, it would make it easy for me to substitute it into the
>> cron job.
>> If you don't feel this is something you'd like to own, maybe DM is still
>> willing to look into updating the current perl script.
>> Thanks everyone for your recent work and work from the past on this.
>> Automation is our friend: it captures nebulous knowledge floating around
>> and places it into a solid description, and keeps humans out of the role
>> of 'bottleneck'. :)
>>       -Troy.
>> Greg Hellings wrote:
>>> Troy,
>>> I've written up a log processor for the download statistics.  It's the
>>> executable .py file in my user directory on the server.  Below is an
>>> example run of it:
>>> [ghellings at www ~]$ ./process_log.py ESV <path-to-log snipped>
>>> Total downloads: 362
>>> Unique downloads: 210
>>> It will accept as many files on the command line as you desire and
>>> report their statistics in aggregate.  Such is most useful for
>>> maintaining information about the IP-address across the multiple
>>> files.  It also works for the FTP files, but for those, relying on the
>>> total downloads is misleading, since it reports individual downloads
>>> of both new AND old testament .bz* files.  Thus, each individual
>>> download of the module should crop up as about 6 files in the "total
>>> downloads" section.  Unique downloads are based solely on IP address.
>>> As an example of the discrepancy of the counting:
>>> [ghellings at www ~]$ ./process_log.py ESV <path-to-log snipped>
>>> Total downloads: 540
>>> Unique downloads: 84
>>> Examples for comparison:
>>> [ghellings at www ~]$ ./process_log.py KJV <ftp log>
>>> Total downloads: 2098
>>> Unique downloads: 163
>>> [ghellings at www ~]$ ./process_log.py KJV <http log>
>>> Total downloads: 342
>>> Unique downloads: 198
>>> Those stats are based off of the currently in-use log files.  If you
>>> would like a version of the script that will also report all module
>>> download totals, that can be provided for little extra work.
>>> --Greg
>>> On Tue, Aug 19, 2008 at 4:14 PM, Greg Hellings <greg.hellings at gmail.com>
>>> wrote:
>>>> Troy,
>>>> On Tue, Aug 19, 2008 at 4:04 PM, Troy A. Griffitts <scribe at crosswire.org>
>>>> wrote:
>>>>> Hey guys.  We have a few needs which need addressing:
>>>>> Log files got a new naming convention recently.  Instead of:
>>>>> ffff
>>>>> ffff.1
>>>>> ffff.2
>>>>> ...
>>>>> It has become
>>>>> ffff
>>>>> ffff-20080819
>>>>> ffff-20080818
>>>>> ...
>>>>> Hence our perl scripts that generate module statistics are not working,
>>>>> seen on the left panel here:
>>>> I don't know thing 1 on Perl, so editing that is out for me.  A
>>>> rewrite is possible into Python if no one with Perl knowledge shows
>>>> up.
>>>>> http://crosswire.org/sword
>>>>> Also, Crossway asks for periodic download statistics for their ESV
>>>>> module.  I generated the last report for them by hand, but I would love
>>>>> for someone to write a script that would run on the first of each month
>>>>> and email them statistics for the previous month.
>>>> What format is the file in (I'm guessing it's an Apache file access
>>>> log)?  A simple Python script should be more than sufficient for this
>>>> purpose.  I can probably whip one up in little time.  Also, what
>>>> statistics are you in need of -- just a download count or do you also
>>>> want to have information on the unique IP address downloads, etc.  A
>>>> sample of one line of the file (or multiple lines, if a file access is
>>>> spread across several lines) which pertains to the ESV should be
>>>> sufficient to base the work off of -- more would be appropriate if
>>>> there are multiple formats the line appears in.  Also, odds are good
>>>> that the same script can be used to generate the statistics for any
>>>> individual module.
>>>> --Greg
>>>>> Any takers?
>>>>>        -Troy.
>>>>> _______________________________________________
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page

More information about the sword-devel mailing list