Page 1 of 1

3rd party stats file unknown encoding

PostPosted: Sat Oct 27, 2012 9:35 pm
by Kakao
The daily_team_summary.txt.bz2 file is now encoded in an unknown 8 bit encoding, likely ISO-8859-2 according to python's chardet:
Code: Select all
>>> import chardet
>>> rawdata = open('daily_team_summary.txt', 'r').read()
>>> chardet.detect(rawdata)
{'confidence': 0.6944575439363857, 'encoding': 'ISO-8859-2'}

That is preventing the KakaoStats statistics processing. The previous encoding was ISO-8859-1 (latin1). Is it a mistake or will it stay as is?

Re: 3rd party stats file unknown encoding

PostPosted: Sat Oct 27, 2012 9:42 pm
by Kakao
iconv can't convert it from ISO-8859-2:
Code: Select all
$ iconv -f ISO_8859-2 -t ISO_8859-1 daily_team_summary.txt -o daily_team_summary.txt.latin1
iconv: illegal input sequence at position 35135

Re: 3rd party stats file unknown encoding

PostPosted: Sat Oct 27, 2012 9:56 pm
by P5-133XL
I pinged some people and hopefully they will address it quickly.

Re: 3rd party stats file unknown encoding

PostPosted: Sat Oct 27, 2012 10:46 pm
by Kakao
I fixed it. My mistake.

Re: 3rd party stats file unknown encoding

PostPosted: Sat Oct 27, 2012 10:52 pm
by Jesse_V
Kakao wrote:I fixed it.

Thanks! :)

Re: 3rd party stats file unknown encoding

PostPosted: Sat Oct 27, 2012 11:15 pm
by VijayPande
ok, so everything is set on your end then?

Re: 3rd party stats file unknown encoding

PostPosted: Sat Oct 27, 2012 11:28 pm
by Kakao
Yes. Sorry for the noise.