You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
zhaoxiangpeng 601cc86af8 add: 豆瓣图书top250 1 week ago
..
douban_book add: 豆瓣图书top250 1 week ago
README.md first commit:豆瓣start 2 weeks ago
run.py add: 豆瓣图书top250 1 week ago
scrapy.cfg first commit:豆瓣图书 2 weeks ago

README.md

安装

pip install scrapy

快速开始项目

创建项目

scrapy startproject douban

创建爬虫

scrapy genspider douabn_top250 book.douban.com

运行爬虫

scrapy crawl douabn_top250

运行多个排重

在项目的根目录,scrapy.cfg 的同级目录新建 .py 文件

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())
# 参数爬虫的名字
process.crawl('douban_***')
process.start()

导出数据

scrapy crawl douban_top250 -o output.json

使用scrapy shell

scrapy shell "http://example.com"