python - Re-encode Unicode stream as Ascii ignoring errors -
I am trying to take a unicode file stream, which contains strange characters, and wrap it with a stream reader It will convert to Eski, ignoring or altering all those letters which can not be encoded.
My stream looks like this:
"EventId", "Rate", "Attribute1" "Attribute2" "(�� ?? ½½¡|¾ ?? " ... My attempt to change the stream on the fly looks like this:
Import charts, oo, codec with open (self.csv_path , 'Rb') as rawdata: detected = chardet.detect (rawdata.read (1000)) with encoding = io.open ['encoding'] [ccv_file (as self.csv_path, 'r' , Encoding = detected encoding) csv_ascii_stream = Codecs.getreader ('ascii') (csv_file, errors = 'ignore') logs (csv_ascii_stream.read ()) log The result on the line is: Unicode encoding error: 'ASCI' in the codec condition. Can not encrypt the transformations. 36-40: Not in serial number (128) Even if I have explicitly created the StreamReader errors = 'ignore' I would like to come out as a result stream (after reading):
"EventID", "rate", "attribute 1", "attribute 2", "(?? ???)? " ... Or alternatively, "EventId", "Rate", "Attribute1", "Attribute2", "()" ( What is the exception?
I have seen many problems / solutions decoding string, but my challenge is to change the stream because it is being read ( .next () By using ), because the file is potentially too large, to load into memory all by once using . Read ()
you encode and dico D. The sides are shaking.
For decoding you are doing fine, you open it in binary data, chardet as first 1K, then find out the encoding Open again in text mode. But then you are already trying to decode decoded data by using ASCII, this function returns a return, which is a Returns the decodes data from the stream that does not work for you CII needs to encode . But it is not clear why you are using a codecs stream decoder or encoder in the first place, when you want to do it , At one time a part of the text is to be encoded so that you can log in it. Why not call the encode method? log (csv_file.read (). Encoded ('ascii', 'ignore')) If you want something you want As a lazy tapping, you can make something completely normal, but it is very easy, such as UTF8Recorder class asciiRecoder: def __init __ (self, f, encoding): self.reader = codecs.getreader (encoding) (f) def __iter __ (self): back self def (self): back Self. Reader NEX (). Sign ("esc", "undiscovered") Or, even more:
with io.open (self.csv_path, '' R ', encoding = detected encoding) as csv_file: csv_ascii_stream = (line.encode (' ascii ',' ignore ') for the line in csv_file)
Comments
Post a Comment