windows - R: can't read unicode text files even when specifying the encoding -

- March 15, 2013

I am using R 3.1.1 on Windows 7 32bits. I'm having a lot of problems reading some text files on which I want to do text analysis. According to Notepad ++, the files are encoded with "UCS-2 Little Endian" . (GrepWin, a device whose name says it all, says that the file is "Unicode".)

The problem is that I do not even want to read the file that encoding is specified (these characters are standard Spanish Latin set- ± ÃƒÆ'Ã Ã â € œA- ³- and it should be handled smoothly with CP1252 or similar.)

  & gt; Sys.getlocale () [1] "LC_COLLATE = Spanish_spen 0.1252; Elsi_sitiwaiipi = Spesaispiani 12.252; Elsiaimattiarai Spanish_spen = 0.1252; Elsi_anayrarik = C; Elsitiaimaiiii = Spesaispiai .1252" & gt; ReadLines ( "filename.txt") [1] "¡¾" "" "" "" "" ... ... ReadLines ( "filename.txt", encoding = "UTF -8") [1] "\ xff \ xfeE "" "" "" "" "... ... ReadLines (" filename.txt ", encoding =" UCS2LE ") [1]" ÃƒÆ'Ã ¢ a, ¬ Å ¡Ãƒâ € SA, a "" "" "" "" "" "" ... ... ReadLines ( "filename.txt", encoding = "UCS-2") [1] "ÃƒÆ'Ã ¢ a, ¬ Å ¡Ãƒâ € SA, Â € "" "" "" "" ... ...    Any ideas?  
 Thank you!  
  Edit: "UTF-16", "UTF-16LE" and "UTF-16BE" encondings evenly fail.   
 < Div class = "post-text" itemprop = "text"> 
 After more thorough reading of the documentation, I got an answer to my question.  
  encoding  ultimate  readlines  contains only  apply to the ultimate input string  documentation says:  
  The encoding to be assumed for the input string. It is used to mark the respective character strings in the Latin-1 or UTF -8:  do not encode again input  part of the latter, encoding connection Specify as or through options (encoding =): see examples. The correct way to read a file with an unusual encoding, then,  
  filetext    & lt; - ReadLines (cone & lt; - file ("UnicodeFile.txt", encoding = "UCS-2 LE")) Close (con)

Get link Facebook X Pinterest Email Other Apps

Comments Post a Comment

Search This Blog

States

windows - R: can't read unicode text files even when specifying the encoding -

Comments

Post a Comment

Popular posts from this blog

java - ImportError: No module named py4j.java_gateway -

python - Receiving "KeyError" after decoding json result from url -

C++ Array Type Not Assignable in Copy Constructor -