I suspect this is a trivial query, but hopefully someone can help me with a query that I I have been using LXML in Khurachi I, trying to make I.
I am working on line-by-line with tutorial 3 and I am still trying to remove the next page link. I can use cssselect to identify the link, but I can not work to separate the only href attribute instead of the full anchor tag.
def scrape_and_look_for_next_link (url): html = scraperwiki.scrape (url) print html root = lxml.html.fromstring (html) # activate the LXML object in HTML Scope_page (route) next_link = root.cssselect ('ol.pagination li a') [- 1] attribute = lxml.html.tostring (next_link) attribute = lxml.html .fromstring (attribute) up to #works this point attribute = Attribute.xpath ('/ href') attribute = lxml.etree.tostring (attribute) print attribute
The CSS selector, with example can select elements in which an href attribute Land a [href] but property values that can not be removed from here. Once you get the element from cssselect, you get the value of next_link.get ('href') attribute.
Comments
Post a Comment