first commit:豆瓣start

5 months ago · f8da4997f4
parent 49317cac55
commit f8da4997f4
2 changed files with 47 additions and 0 deletions
--- a/douban_book/README.md
+++ b/douban_book/README.md
@ -0,0 +1,43 @@
+# 安装
+```shell
+pip install scrapy
+```
+
+# 快速开始项目
+
+## 创建项目
+```shell
+scrapy startproject douban
+```
+
+## 创建爬虫
+```shell
+scrapy genspider douabn_top250 book.douban.com
+```
+
+## 运行爬虫
+```shell
+scrapy crawl douabn_top250
+```
+
+## 运行多个排重
+在项目的根目录，`scrapy.cfg` 的同级目录新建 `.py` 文件
+```shell
+from scrapy.crawler import CrawlerProcess
+from scrapy.utils.project import get_project_settings
+
+process = CrawlerProcess(get_project_settings())
+# 参数爬虫的名字
+process.crawl('douban_***')
+process.start()
+```
+
+## 导出数据
+```shell
+scrapy crawl douban_top250 -o output.json
+```
+
+## 使用scrapy shell
+```shell
+scrapy shell "http://example.com"
+```
--- a/douban_book/run.py
+++ b/douban_book/run.py
@ -0,0 +1,4 @@
+# -*- coding: utf-8 -*-
+# @Time    : 2025/8/19 17:13
+# @Author  : zhaoxiangpeng
+# @File    : run.py