for bereishit.txt
letterCount: 78063
wordCount: 20614
verse count: 1533
double count: 30
1:26 1:29 1:29 6:9 7:2 7:3 7:9 7:14 7:15 7:19 7:21 8:17 11:10 11:27 12:1 14:10 20:18 22:11 25:19 25:30 27:30 30:43 32:17 36:31 37:33 39:10 43:3 43:14 44:28 46:2
--------------
for shmot.txt
letterCount: 63527
wordCount: 16714
verse count: 1209
double count: 14
2:19 3:4 7:17 8:10 15:1 15:21 15:25 16:5 16:21 23:30 30:7 34:6 36:3 36:4
--------------
for vayikra.txt
letterCount: 44790
wordCount: 11950
verse count: 859
double count: 20
5:19 6:5 10:16 11:41 11:42 11:43 13:38 15:2 17:3 17:8 17:10 17:13 18:6 19:34 20:2 20:2 20:9 22:4 22:18 24:15
--------------
for bamidbar.txt
letterCount: 63529
wordCount: 16408
verse count: 1288
double count: 20
1:4 1:44 3:9 3:47 4:19 4:49 5:12 5:22 7:86 8:16 9:10 12:14 14:7 14:34 17:17 17:28 28:21 28:29 29:10 35:26
--------------
for devarim.txt
letterCount: 54892
wordCount: 14295
verse count: 955
double count: 7
2:27 7:22 14:22 16:20 28:43 28:43 32:39
--------------
Total counts
letters 304801
words 79981
verses 5844
let: uni: count:
------------------
alef: 1488 27060
bet: 1489 16345
gimel: 1490 2109
daled: 1491 7032
heh: 1492 28055
vav: 1493 30509
zayin: 1494 2198
chet: 1495 7189
tet: 1496 1804
yud: 1497 31531
chaf s: 1498 3358
chaf: 1499 8610
lamed: 1500 21570
mem s: 1501 10624
mem: 1502 14466
nun s: 1503 4259
nun: 1504 9867
samech: 1505 1833
ayin: 1506 11250
peh s: 1507 830
peh: 1508 3975
tzadi s:1509 1035
tzadi: 1510 2927
kuf: 1511 4695
resh: 1512 18125
shin: 1513 15595
tav: 1514 17950
total: 304801
Many differences in letter frequencies from the aishdas article: http://www.aishdas.org/toratemet/en_pamphlet9.html
ReplyDeleteYeah. It's not surprising since aishdas is reporting results that were hand counted. Most are off by one or two. The amount off seems unrelated to how common the letter is (you might expect larger raw number errors on vavs and hehs than you do tets). One letter, tzadi, not a common one, is off by exactly 100, which to me looks like a scribal error.
ReplyDeleteWhere did you get this digital version of the Torah and could you make a copy of it available for download?
ReplyDeleteMy current digital version has 304,805 letters total, and I would like to see what the differences are between my version and your version.
Is there a text file of all 304,805 letters in one string available online? I'm surprised no has ever done it before with so many computers around.
Delete@Anon
ReplyDeleteI scraped the digital version from mechon-mamre.org. It required some editing to remove the verse headings. I think I did all of this correctly, but I could have erred.
Thanks, that explains the difference. mechon-mamre.org uses a Yemenite edition of the Torah which has 304,801 letters, as opposed to the 304,805 letters in the editions used by Ashkenazi/Sephardic communities.
DeleteIs it just a difference of 4 letters within words - if so, any idea which letters in which words?
DeleteThis comment has been removed by the author.
ReplyDeleteI came across your blog just now whilst trying to clarify the suggestion if the words Darosh Darash are the middle words of the Torah - as a number of sources suggest they are, whereas others suggest they are not the middle words, but the middle pair of duplicate words. I was checking it against my own digital Torah text which indicated it was clearly not the mid word (but could be the mid duplicate pair). My digital version I made as follows - I took the Torah text from mechon-mamre.org a few years ago and applied a word count. I then compared it to an alternate on-line source (I don't recall which one) and cross referenced any discrepancies, or which there were a few, against a printed Koren Torah. A number of differences were caused by the way some hyphenated words have copied from Machon Mamre and a comparably a few caused by pairs of words that were either joined and needed separating or separated needing to be joined. Overall I ended up with 4 words less than yourself - with 79977 words. The middle word being word 39988 - EL (Alef Lamed) in Vayikra Chap 8 verse 15. Any ideas where the suggestion came from that the middle words are Darosh Daresh ? Was it from someone miss-quoting the fact that its actually the middle of duplicate pair words which happens to be within approx 900 words of the actual centre?
ReplyDeleteThis comment has been removed by the author.
DeleteSimon
DeleteSee Kefirah's main post on this topic:
https://kefirahoftheweek.blogspot.com/2015/04/the-center-of-torah.html
Lots of info there.