multithreading - Can I accelerate python file method `read()` by parallelism? -


I have many files to read (300 ~ 500), and I want to speed up this task.

is the illustration:

  import import import from import import _io filelist = map (open, OSListDipper) if __name__ == '__main__': with pool (): Pool: A = pool.map (_io.TextIowrapper.read, filelist)   

Of course, I got an error:

  type ERROR: Can not serialize '_IO.TextIowrapper' object   

The question is: Can I speed up the I / O process with equality? if so, how?

Update findings:

Now I have found a path of equality and my code has been tested:

I used 22 items, total 63.2 Import from the multiprocessex import

  Import import import _Of my_read (file_name): with open (file_name) f: return f.read () def mul (): in the form of pool (form of pool In): A = pool.map (my_read, os.listdir ()) def single (): a = [] for i in os.listdir (): open (i) as f: r = f.read () A. Unfortunately,  single ()  costs 0.4s while the  is    Mul ()  cost 0.8.  

Update 1:

Some people said that this is an IO-bound task, so I can not improve it by parallelismã? ? However, I can find these words in these words:

  However, threading is still a suitable model if you want to run multiple I / O-bound tasks together.   

The complete code is here:

My intention is to move from to txt .

I have paralleled char2text and now I readall :

  import from zipfile to multiprocessing import Want to import pool import bs4 def char2text (i): soup = bs4 Beautiful soup (i) chapters = soup.body.getText (). Splitlines () Chapter = "\ n" .join (chapters) .strip () + "\ n \ n" Return Chapter Class Epub (zipfile .zipfile): def __init __ (auto, file, mode = 'r', Compression = 0, permissions = = false): zipfile.ZipFile .__ init __ (self, file, mode, compression, allowZip64) if mode == 'r': self.opf = self.read ('OEBPS / content.opf '). Decode () opf_soup = bs4 Beautiful (self.opf) self.author = opf_soup.find (name = 'dc: creator') .gettext () self.title = opf_soup.find (name = 'dc: title'). GetText () try: self.description = opf_soup.find (name = 'dc: description'). Except GetText (): Self.description = 'Try: self.chrpattern = opf_soup.find (name =' dc: chrpattern '). GetText () excep T: self.chrpattern = '' self.cover = self.read ('OEBPS / images / cover.jpg') elif mode == 'w': pass def gettext (self): self.tempread = " "Pool () as pool () as charlist = self pool ():. (Self.name list) (pool.map) (char2text, charlist) self.tempread = ".join (txtlist) back self.tempread def readall (self, name list): charlist = [] for me in the list: if i .startswith ('OEBPS /') and i.endswith ('. Xhtml'): r = self.read (i) .decode () charlist.append (r) return charlist def apuBTTit (self): with open tempread = Self.get_text () () As f: (f) .write (tempread) __name__ == "__main__" == "__main__" =) "open (self.title + '.txt', 'w', encoding = 'utf8': e = ipub ( "Ashes .appb") Import the CPR file ciprofile.Run ("E.P.B.T.T.T.T.)"    

Are you trying to do something like this: import import import import import import _io def my_read (file_name): open (file_name) with f: return _io.TextIowrapper .read (f) if __name__ == '__main__': pool (as pool): a = pool.map (my_re ad, os.listdir ('s Ome_dir '))

In sub-process, I find more logical to open / close the file and the string is easily serialized.

Your ReadLow method Try:

  def readel (self, named): filter_func = lambda i: i.startswith ('OEBPS /') and i.endswith ('.xhtml') Read_fun = lambda I: self.read (i) .decode () pool () as a pool: a = pool.map (read_fun, filter (filter_func, list)) a    

Comments

Popular posts from this blog

java - ImportError: No module named py4j.java_gateway -

python - Receiving "KeyError" after decoding json result from url -

.net - Creating a new Queue Manager and Queue in Websphere MQ (using C#) -