You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
1 week ago | |
---|---|---|
.. | ||
douban_book | 1 week ago | |
README.md | 2 weeks ago | |
run.py | 1 week ago | |
scrapy.cfg | 2 weeks ago |
README.md
安装
pip install scrapy
快速开始项目
创建项目
scrapy startproject douban
创建爬虫
scrapy genspider douabn_top250 book.douban.com
运行爬虫
scrapy crawl douabn_top250
运行多个排重
在项目的根目录,scrapy.cfg
的同级目录新建 .py
文件
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
# 参数爬虫的名字
process.crawl('douban_***')
process.start()
导出数据
scrapy crawl douban_top250 -o output.json
使用scrapy shell
scrapy shell "http://example.com"