Page 1 of 1

3rd party stats file unknown encoding

Posted: Sat Oct 27, 2012 8:35 pm
by Kakao
The daily_team_summary.txt.bz2 file is now encoded in an unknown 8 bit encoding, likely ISO-8859-2 according to python's chardet:

Code: Select all

>>> import chardet
>>> rawdata = open('daily_team_summary.txt', 'r').read()
>>> chardet.detect(rawdata)
{'confidence': 0.6944575439363857, 'encoding': 'ISO-8859-2'}
That is preventing the KakaoStats statistics processing. The previous encoding was ISO-8859-1 (latin1). Is it a mistake or will it stay as is?

Re: 3rd party stats file unknown encoding

Posted: Sat Oct 27, 2012 8:42 pm
by Kakao
iconv can't convert it from ISO-8859-2:

Code: Select all

$ iconv -f ISO_8859-2 -t ISO_8859-1 daily_team_summary.txt -o daily_team_summary.txt.latin1
iconv: illegal input sequence at position 35135

Re: 3rd party stats file unknown encoding

Posted: Sat Oct 27, 2012 8:56 pm
by P5-133XL
I pinged some people and hopefully they will address it quickly.

Re: 3rd party stats file unknown encoding

Posted: Sat Oct 27, 2012 9:46 pm
by Kakao
I fixed it. My mistake.

Re: 3rd party stats file unknown encoding

Posted: Sat Oct 27, 2012 9:52 pm
by Jesse_V
Kakao wrote:I fixed it.
Thanks! :)

Re: 3rd party stats file unknown encoding

Posted: Sat Oct 27, 2012 10:15 pm
by VijayPande
ok, so everything is set on your end then?

Re: 3rd party stats file unknown encoding

Posted: Sat Oct 27, 2012 10:28 pm
by Kakao
Yes. Sorry for the noise.