5.3.5. 爬虫相关

Selenium是自动化测试工具。它支持各种浏览器,包括 Chrome,Safari,Firefox 等主流界面式浏览器:

pip install selenium   // 安装到/Library/Python/2.7/site-packages/

Selenium 2是 Selenium 和 WebDriver 两个项目的合并

chromedriver下载地址:

https://sites.google.com/a/chromium.org/chromedriver/downloads

Scrapy项目——开源爬虫系统

  • /html/head/title : selects the <title> element, inside the <head> element of a HTML document
  • /html/head/title/text() : selects the text inside the aforementioned <title> element.
  • //td : selects all the <td> elements
  • //div[@class="mine"] : selects all div elements which contain an attribute class=”mine”

轻量级爬虫: