first commit:豆瓣start
parent
49317cac55
commit
f8da4997f4
@ -0,0 +1,43 @@
|
||||
# 安装
|
||||
```shell
|
||||
pip install scrapy
|
||||
```
|
||||
|
||||
# 快速开始项目
|
||||
|
||||
## 创建项目
|
||||
```shell
|
||||
scrapy startproject douban
|
||||
```
|
||||
|
||||
## 创建爬虫
|
||||
```shell
|
||||
scrapy genspider douabn_top250 book.douban.com
|
||||
```
|
||||
|
||||
## 运行爬虫
|
||||
```shell
|
||||
scrapy crawl douabn_top250
|
||||
```
|
||||
|
||||
## 运行多个排重
|
||||
在项目的根目录,`scrapy.cfg` 的同级目录新建 `.py` 文件
|
||||
```shell
|
||||
from scrapy.crawler import CrawlerProcess
|
||||
from scrapy.utils.project import get_project_settings
|
||||
|
||||
process = CrawlerProcess(get_project_settings())
|
||||
# 参数爬虫的名字
|
||||
process.crawl('douban_***')
|
||||
process.start()
|
||||
```
|
||||
|
||||
## 导出数据
|
||||
```shell
|
||||
scrapy crawl douban_top250 -o output.json
|
||||
```
|
||||
|
||||
## 使用scrapy shell
|
||||
```shell
|
||||
scrapy shell "http://example.com"
|
||||
```
|
@ -0,0 +1,4 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/8/19 17:13
|
||||
# @Author : zhaoxiangpeng
|
||||
# @File : run.py
|
Loading…
Reference in New Issue