multithreading - Can I accelerate python file method `read()` by parallelism? -
I have many files to read (300 ~ 500), and I want to speed up this task.
is the illustration:
import import import from import import _io filelist = map (open, OSListDipper) if __name__ == '__main__': with pool (): Pool: A = pool.map (_io.TextIowrapper.read, filelist) Of course, I got an error:
type ERROR: Can not serialize '_IO.TextIowrapper' object The question is: Can I speed up the I / O process with equality? if so, how?
Update findings:
Now I have found a path of equality and my code has been tested:
I used 22 items, total 63.2 Import from the multiprocessex import
Import import import _Of my_read (file_name): with open (file_name) f: return f.read () def mul (): in the form of pool (form of pool In): A = pool.map (my_read, os.listdir ()) def single (): a = [] for i in os.listdir (): open (i) as f: r = f.read () A. Unfortunately, single () costs 0.4s while the is Mul () cost 0.8.
Update 1:
Some people said that this is an IO-bound task, so I can not improve it by parallelismã? ? However, I can find these words in these words:
However, threading is still a suitable model if you want to run multiple I / O-bound tasks together.
The complete code is here:
My intention is to move from to txt . I have paralleled char2text and now I readall : import from zipfile to multiprocessing import Want to import pool import bs4 def char2text (i): soup = bs4 Beautiful soup (i) chapters = soup.body.getText (). Splitlines () Chapter = "\ n" .join (chapters) .strip () + "\ n \ n" Return Chapter Class Epub (zipfile .zipfile): def __init __ (auto, file, mode = 'r', Compression = 0, permissions = = false): zipfile.ZipFile .__ init __ (self, file, mode, compression, allowZip64) if mode == 'r': self.opf = self.read ('OEBPS / content.opf '). Decode () opf_soup = bs4 Beautiful (self.opf) self.author = opf_soup.find (name = 'dc: creator') .gettext () self.title = opf_soup.find (name = 'dc: title'). GetText () try: self.description = opf_soup.find (name = 'dc: description'). Except GetText (): Self.description = 'Try: self.chrpattern = opf_soup.find (name =' dc: chrpattern '). GetText () excep T: self.chrpattern = '' self.cover = self.read ('OEBPS / images / cover.jpg') elif mode == 'w': pass def gettext (self): self.tempread = " "Pool () as pool () as charlist = self pool ():. (Self.name list) (pool.map) (char2text, charlist) self.tempread = ".join (txtlist) back self.tempread def readall (self, name list): charlist = [] for me in the list: if i .startswith ('OEBPS /') and i.endswith ('. Xhtml'): r = self.read (i) .decode () charlist.append (r) return charlist def apuBTTit (self): with open tempread = Self.get_text () () As f: (f) .write (tempread) __name__ == "__main__" == "__main__" =) "open (self.title + '.txt', 'w', encoding = 'utf8': e = ipub ( "Ashes .appb") Import the CPR file ciprofile.Run ("E.P.B.T.T.T.T.)"
Are you trying to do something like this: import import import import import import _io def my_read (file_name): open (file_name) with f: return _io.TextIowrapper .read (f) if __name__ == '__main__': pool (as pool): a = pool.map (my_re ad, os.listdir ('s Ome_dir ')) In sub-process, I find more logical to open / close the file and the string is easily serialized.
Your ReadLow method Try:
def readel (self, named): filter_func = lambda i: i.startswith ('OEBPS /') and i.endswith ('.xhtml') Read_fun = lambda I: self.read (i) .decode () pool () as a pool: a = pool.map (read_fun, filter (filter_func, list)) a
Comments
Post a Comment