[sword-devel] Character Frequency

Greg Hellings greg.hellings at gmail.com
Mon Jul 4 11:10:40 MST 2011


Fixed:
$ count.py kjv.xml
Code point      Character       Name            Count
000020                                     SPACE 1669596
000022          "                 QUOTATION MARK 1661832
00006F          o           LATIN SMALL LETTER O 1330866
000072          r           LATIN SMALL LETTER R 1307266
000073          s           LATIN SMALL LETTER S 1172801
000065          e           LATIN SMALL LETTER E 1156121
00006E          n           LATIN SMALL LETTER N 1092384
00006D          m           LATIN SMALL LETTER M 1029125
000074          t           LATIN SMALL LETTER T 901465
00003C          <                 LESS-THAN SIGN 864037
00003E          >              GREATER-THAN SIGN 864037
00003D          =                    EQUALS SIGN 830916
000061          a           LATIN SMALL LETTER A 776214
000077          w           LATIN SMALL LETTER W 772641
000068          h           LATIN SMALL LETTER H 625029
00003A          :                          COLON 609087
000067          g           LATIN SMALL LETTER G 560652
00006C          l           LATIN SMALL LETTER L 497519
00002F          /                        SOLIDUS 469056
000069          i           LATIN SMALL LETTER I 406801
000030          0                     DIGIT ZERO 393184
000070          p           LATIN SMALL LETTER P 370919
000031          1                      DIGIT ONE 350731
000048          H         LATIN CAPITAL LETTER H 312386
000032          2                      DIGIT TWO 290358
000038          8                    DIGIT EIGHT 283469
000033          3                    DIGIT THREE 263960
000064          d           LATIN SMALL LETTER D 257239
00002E          .                      FULL STOP 220707
000035          5                     DIGIT FIVE 209066
000062          b           LATIN SMALL LETTER B 204056
000034          4                     DIGIT FOUR 197713
000063          c           LATIN SMALL LETTER C 197400
000037          7                    DIGIT SEVEN 193701
000036          6                      DIGIT SIX 183464
000047          G         LATIN CAPITAL LETTER G 175932
000039          9                     DIGIT NINE 172006
00002D          -                   HYPHEN-MINUS 152074
000049          I         LATIN CAPITAL LETTER I 133127
00004D          M         LATIN CAPITAL LETTER M 126782
000044          D         LATIN CAPITAL LETTER D 121721
00004E          N         LATIN CAPITAL LETTER N 115182
000076          v           LATIN SMALL LETTER V 114636
000054          T         LATIN CAPITAL LETTER T 113384
000075          u           LATIN SMALL LETTER U 111775
000079          y           LATIN SMALL LETTER Y 109108
000050          P         LATIN CAPITAL LETTER P 107290
000041          A         LATIN CAPITAL LETTER A 94242
000053          S         LATIN CAPITAL LETTER S 85226
000066          f           LATIN SMALL LETTER F 84923
00002C          ,                          COMMA 74768
000043          C         LATIN CAPITAL LETTER C 73229
00004A          J         LATIN CAPITAL LETTER J 39531
000056          V         LATIN CAPITAL LETTER V 36203
00006B          k           LATIN SMALL LETTER K 35707
00000A
                       not found 34899
000045          E         LATIN CAPITAL LETTER E 25991
000052          R         LATIN CAPITAL LETTER R 24737
000046          F         LATIN CAPITAL LETTER F 23948
00004F          O         LATIN CAPITAL LETTER O 20676
000078          x           LATIN SMALL LETTER X 18179
00004C          L         LATIN CAPITAL LETTER L 16367
00003B          ;                      SEMICOLON 10159
00007A          z           LATIN SMALL LETTER Z 6930
00004B          K         LATIN CAPITAL LETTER K 5389
000042          B         LATIN CAPITAL LETTER B 5047
00003F          ?                  QUESTION MARK 3421
000058          X         LATIN CAPITAL LETTER X 3283
002026          …            HORIZONTAL ELLIPSIS 3115
0000B6          ¶                   PILCROW SIGN 2970
00006A          j           LATIN SMALL LETTER J 2596
000057          W         LATIN CAPITAL LETTER W 2489
000071          q           LATIN SMALL LETTER Q 2334
000027          '                     APOSTROPHE 2040
00005A          Z         LATIN CAPITAL LETTER Z 1776
002013          –                        EN DASH 920
000055          U         LATIN CAPITAL LETTER U 797
000059          Y         LATIN CAPITAL LETTER Y 551
000021          !               EXCLAMATION MARK 313
000028          (               LEFT PARENTHESIS 240
000029          )              RIGHT PARENTHESIS 240
000051          Q         LATIN CAPITAL LETTER Q 199
0000E6          æ          LATIN SMALL LETTER AE 93
00007B          {             LEFT CURLY BRACKET 5
00007D          }            RIGHT CURLY BRACKET 5
0000C6          Æ        LATIN CAPITAL LETTER AE 3
0005D1          ב              HEBREW LETTER BET 1
0005D5          ו              HEBREW LETTER VAV 1
0005D9          י              HEBREW LETTER YOD 1
0005E1          ס           HEBREW LETTER SAMEKH 1
0005E9          ש             HEBREW LETTER SHIN 1
0005D2          ג            HEBREW LETTER GIMEL 1
0005D6          ז            HEBREW LETTER ZAYIN 1
0005DE          מ              HEBREW LETTER MEM 1
0005E2          ע             HEBREW LETTER AYIN 1
0005E6          צ            HEBREW LETTER TSADI 1
0005EA          ת              HEBREW LETTER TAV 1
0005D3          ד            HEBREW LETTER DALET 1
0005D7          ח              HEBREW LETTER HET 1
0005DB          כ              HEBREW LETTER KAF 1
0005E7          ק              HEBREW LETTER QOF 1
002015          ―                 HORIZONTAL BAR 1
0005D0          א             HEBREW LETTER ALEF 1
0005D4          ה               HEBREW LETTER HE 1
0005D8          ט              HEBREW LETTER TET 1
0005DC          ל            HEBREW LETTER LAMED 1
0005E0          נ              HEBREW LETTER NUN 1
0005E4          פ               HEBREW LETTER PE 1
0005E8          ר             HEBREW LETTER RESH 1

--Greg

On Mon, Jul 4, 2011 at 10:41 AM, David Haslam <dfhmch at googlemail.com> wrote:
> Output is a tad less descriptive than that from BabelPad.
>
> Here's the first 25 lines from a file I was working on.
>
> /For files with long character names, best to use a wider tab setting in
> one's editor./
>
> Code point      Character       Character Name  Count
> 000020          SPACE   609,105
> 000021  !       EXCLAMATION MARK        2,009
> 000022  "       QUOTATION MARK  2,245
> 000027  '       APOSTROPHE      199
> 000028  (       LEFT PARENTHESIS        93
> 000029  )       RIGHT PARENTHESIS       93
> 00002A  *       ASTERISK        3,500
> 00002B  +       PLUS SIGN       66
> 00002C  ,       COMMA   73,327
> 00002D  -       HYPHEN-MINUS    901
> 00002E  .       FULL STOP       22,991
> 000030  0       DIGIT ZERO      2,822
> 000031  1       DIGIT ONE       14,709
> 000032  2       DIGIT TWO       10,486
> 000033  3       DIGIT THREE     6,626
> 000034  4       DIGIT FOUR      4,786
> 000035  5       DIGIT FIVE      3,897
> 000036  6       DIGIT SIX       3,478
> 000037  7       DIGIT SEVEN     3,230
> 000038  8       DIGIT EIGHT     3,062
> 000039  9       DIGIT NINE      2,920
> 00003A  :       COLON   10,445
> 00003B  ;       SEMICOLON       11,513
> 00003F  ?       QUESTION MARK   3,010
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/Character-Frequency-tp3642222p3643921.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>



More information about the sword-devel mailing list