scrapy - sqlite3 database error .. i really get tired :( -


I really need help in accomplishing this task because it is related to my research and I used to describe Python and Scrappy New to

* The task is to select all the input fields (type = text or password or file) and store it in back-end db except for the page link where it is input

My code for selecting input field

  DRF parse_item (auto, response): self.log ('% s'% Response.url) hxs = HtmlXPathSelector (response) items = IsaItem () item ['response_fld'] = response.url item ['text_input'] = hxs.select ("// input [(id or @name) and (@ Type = 'text')] / @Id "). Extract () item ['pass_input'] = hxs.select ("// input [(id or @name) and (@ type = 'password')] / @ id"). Remove () item ['file_input'] = hxs.select ("// input [(id or @name) and (@ type = 'file')] / @ id". Remove () return item  < / Pre> 

Database pipeline code:

  class SQLiteStorePipeline (object): def __init __ (self): self.conn = sqlite3.connect ('./project.db') itself .cur = self.conn.cursor () def process_item (auto, item, spider): enter self.cur.execute ("Input (input_name) values" (?) ", (Item ['text_input'] [0] ), Enter the self.cur.execute ("Input (input_name) values ​​(?)", (Items ['pass_input'] [0],)) into self.cur.execute ("input (input_name) values" (?) ", (I In ['file_input'] [0],)) self.cur.execute ("Enter link values" (?) ", (Item ['response_fld'] [0],)).   
  self.cur.execute ("enter input (input_name) values"?) ", (Item ['text_input'] [0]),) exceptions.IndexError: out of range List index   

or database stores only the first letter! !

  database link table "???" Line Line Line Line Line ?? One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ?? ?? One ???? One ???? a???? Link one ???? a?? One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ?? Line Line Line Line Line Line Line Line Id "input" one ???? One ???? One ???? One ???? One ???? One ???? One ???? One ?? ¼ c One ???? One? Line Line Line Line Line Line Line Line 1 one t One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ?? ¼ c One ???? One? Line Line Line Line Line Line Line Line 2 a t One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ?? §a One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ???? One ???? Note that "tbPassword" or "tbUsername"   

ouput fron json file

  {"pass_input": ["tbPassword"], "file_input": [], "Response_fld": "http://testaspnet.vulnweb.com/signup.aspx", "text_input": ["tbUsername"] {"pass_input": [], "file_input": [], "response_fld ":" Http: //testaspnet.vulnweb.com/default.aspx "," text_input ": []} {" pass_input ": [" tbPassword "]," file_input ": []," response_fld ":" http: Http://testaspnet.vulnweb.com/login.aspx "," text_input ": [" tbUsername "] {" pass_input ": []," file_input ": []," response_fld ":" http: // testaspnet. Vulnweb .com / comments.aspx? Id = 0 "," text_input ": []}    

You are receiving IndexError because you are the first person in the list Shall strive to achieve, which is sometimes empty.

I would like to do this.

Spider:

  DRF parse_item (auto, response): self.log ('% s'% response.url) hxs = HtmlX Path selector (response) items = IsaItem () item ['response_fld'] = Response.url res = hxs.select ("// input [(id or @ name) and (@ type = 'text')] / @ id"). Extract () item ['text_input'] = Res [0] If none else, then none = no value If no field is found then res = hxs.select ("// input [ (Id or @name) and (@type = 'password')] / @ id ".extract () item ['pass_input'] = res [0] If there is no other then there is no value, If no fields are found then res = hxs.select ("// input [[id (@ id) or @name) and (if no one else does not exist then no fields will be found     

pipeline:

  class SQLiteStorePipel Ine (object): def __init __ (self): self.conn = sqlite3.connect ('./project.db') self.cur = Self.conn.cursor () def process_item (self, item, spider): self Enter the input into ".cur.execute" ("Input (input_name) values ​​(?)", (Item ['text_input'],)) self.cur ("input (input_name) values" (?) "Input" ("input_name ) (Enter ['' input (input_name) values ​​"([['Response_fld'],)) self.conn.commit ) Return Item    

Comments