multithreading - Can I accelerate python file method `read()` by parallelism? -

- April 15, 2011

I have many files to read (300 ~ 500), and I want to speed up this task.

is the illustration:

  import import import from import import _io filelist = map (open, OSListDipper) if __name__ == '__main__': with pool (): Pool: A = pool.map (_io.TextIowrapper.read, filelist)    Of course, I got an error:  
  type ERROR: Can not serialize '_IO.TextIowrapper' object    The question is: Can I speed up the I / O process with equality? if so, how?  
 Update findings:   Now I have found a path of equality and my code has been tested:  
 I used 22 items, total 63.2 Import from the multiprocessex import  
  Import import import _Of my_read (file_name): with open (file_name) f: return f.read () def mul (): in the form of pool (form of pool In): A = pool.map (my_read, os.listdir ()) def single (): a = [] for i in os.listdir (): open (i) as f: r = f.read () A. Unfortunately,  single ()  costs 0.4s while the  is    Mul ()  cost 0.8.  
  Update 1:   Some people said that this is an IO-bound task, so I can not improve it by parallelismã? ? However, I can find these words in these words:  
  However, threading is still a suitable model if you want to run multiple I / O-bound tasks together.   
  The complete code is here:  
 My intention is to move  from  to  txt .  
 I have paralleled  char2text  and now I  readall :  
  import from zipfile to multiprocessing import Want to import pool import bs4 def char2text (i): soup = bs4 Beautiful soup (i) chapters = soup.body.getText (). Splitlines () Chapter = "\ n" .join (chapters) .strip () + "\ n \ n" Return Chapter Class Epub (zipfile .zipfile): def __init __ (auto, file, mode = 'r', Compression = 0, permissions = = false): zipfile.ZipFile .__ init __ (self, file, mode, compression, allowZip64) if mode == 'r': self.opf = self.read ('OEBPS / content.opf '). Decode () opf_soup = bs4 Beautiful (self.opf) self.author = opf_soup.find (name = 'dc: creator') .gettext () self.title = opf_soup.find (name = 'dc: title'). GetText () try: self.description = opf_soup.find (name = 'dc: description'). Except GetText (): Self.description = 'Try: self.chrpattern = opf_soup.find (name =' dc: chrpattern '). GetText () excep T: self.chrpattern = '' self.cover = self.read ('OEBPS / images / cover.jpg') elif mode == 'w': pass def gettext (self): self.tempread = " "Pool () as pool () as charlist = self pool ():. (Self.name list) (pool.map) (char2text, charlist) self.tempread = ".join (txtlist) back self.tempread def readall (self, name list): charlist = [] for me in the list: if i .startswith ('OEBPS /') and i.endswith ('. Xhtml'): r = self.read (i) .decode () charlist.append (r) return charlist def apuBTTit (self): with open tempread = Self.get_text () () As f: (f) .write (tempread) __name__ == "__main__" == "__main__" =) "open (self.title + '.txt', 'w', encoding = 'utf8': e = ipub ( "Ashes .appb") Import the CPR file ciprofile.Run ("E.P.B.T.T.T.T.)"     
  Are you trying to do something like this:   import  import import import import import _io def my_read (file_name): open (file_name) with f: return _io.TextIowrapper .read (f) if __name__ == '__main__': pool (as pool): a = pool.map (my_re ad, os.listdir ('s Ome_dir '))   
 In sub-process, I find more logical to open / close the file and the string is easily serialized.  
 Your ReadLow method Try:  
  def readel (self, named): filter_func = lambda i: i.startswith ('OEBPS /') and i.endswith ('.xhtml') Read_fun = lambda I: self.read (i) .decode () pool () as a pool: a = pool.map (read_fun, filter (filter_func, list)) a    

 




  



















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




java - ImportError: No module named py4j.java_gateway -



-



August 15, 2015








    I'm trying to call the Python using the Java program  py4j . I've been installing the plug-in Eclipse and test name Piidvi project. I'm trying to execute the following part of the code found on py4j webpage:  Import from py4j.java_gateway to JavaGateway, java_import gateway = JavaGateway () jvm = gateway.jvm java_import (jvm, '' Org.eclipse Kkorkrisorsej. * ") Vrkspes_rut = Jvankresourkesplginkgetvrkspas (). GetRoot (Gateway .help (workspace_root, '* Projects *') project_names = [project.getName () (for projects workpace_root.getProjects))] print (Projekt_nam)    But I There is an error in import. I have checked that the P4JJ is present in the Jar Eclipse plugin directory. Can anyone help please?      I had to install  the py4j application   





Read more





python - Receiving "KeyError" after decoding json result from url -



-



May 15, 2012








    I am new to Python I am trying to parse JSON result from a URL. Basically, I was using the following:    response = urllib.request.urlopen (url) json_obj = json.load (response)    It should be a stroke "str 'not' bytes' in the lines of a given" JSON object, so after searching on the StackoverView Flo, I decode the response in this way:    F = urllib.request.urlopen (Url) charset = f.info (). Get_param ('charset', 'utf8') data = f.read () decoded = json.loads (data.decode (charset))    If I print "decode" I is as follows:    { 'link': { 'summary data': 'https: // localhost / piwebapi / streams / p0_7qHaW4UHU-RlCaz8tpasAAQAAAAU0hJTExNQU42NDIwXFNJTlVTT0lE / summary' 'value': 'https: // localhost / Piwebapi / streams / P0_7qHaW4UHU-RlCaz8tpasAAQAAAAU0hJTExNQU42NDIwXFNJTlVTT0lE / price ',' InterpolatedData ':' https: // localhost / Piwebapi / streams / P0_7qHaW4UHU-RlCaz8tpasAAQAAAAU0hJTE...





Read more





java - Session timeout does't work vaadin -



-



July 15, 2014








    I am developing a vaadin web application and I have added the following snippet of code in my web.xml.    & lt; Session-Config & gt; & Lt; Session timeout & gt; 30 & lt; / Session timeout & gt; & Lt; / Session-config & gt;    Now I have noticed after 30 minutes that my users are capable of using appllication and I do not want this. I read something about this problem on the book Wadin but I do not believe in anything.   From the Wadin book:    The session expires after user deactivation   In the normal servlet operation, the session time defines the allowed time of inactivity After which the server should clear the session. Inactivity is measured by the last server request. Different servicelists use different default for container timeouts, such as 30 minutes for Apache Tomcat. You can set the timeout with:   In a web.xml:    & lt; Session-config & gt; & Lt; Session timeout & gt; 30 & lt; / Session timeout & gt; & Lt; / S...





Read more

Search This Blog

States

multithreading - Can I accelerate python file method `read()` by parallelism? -

Comments

Post a Comment

Popular posts from this blog

java - ImportError: No module named py4j.java_gateway -

python - Receiving "KeyError" after decoding json result from url -

java - Session timeout does't work vaadin -