python 2.7 - Scraperwiki scrape query: using lxml to extract links -


I suspect this is a trivial query, but hopefully someone can help me with a query that I I have been using LXML in Khurachi I, trying to make I.

I am working on line-by-line with tutorial 3 and I am still trying to remove the next page link. I can use cssselect to identify the link, but I can not work to separate the only href attribute instead of the full anchor tag.

  def scrape_and_look_for_next_link (url): html = scraperwiki.scrape (url) print html root = lxml.html.fromstring (html) # activate the LXML object in HTML Scope_page (route) next_link = root.cssselect ('ol.pagination li a') [- 1] attribute = lxml.html.tostring (next_link) attribute = lxml.html .fromstring (attribute) up to #works this point attribute = Attribute.xpath ('/ href') attribute = lxml.etree.tostring (attribute) print attribute    

The CSS selector, with example can select elements in which an href attribute Land a [href] but property values ​​that can not be removed from here.

Once you get the element from cssselect, you get the value of next_link.get ('href') attribute.

Comments