# weixin

**Repository Path**: icepear_eric/weixin

## Basic Information

- **Project Name**: weixin
- **Description**: 爬取微信公众号的文章
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-10-10
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# weixin
抓取搜狗微信公众号文章
因为微信公众号网页是动态的，不能使用requests进行抓取，因此采用了selenium来进行抓取，包括了一下模块：

1. url队列存储模块db.py：
    采用了是redis来进行存储。为了方便调度，将url和其对应解析模块组成的列表一起存入了redis的列表中
    url队列采用了先进先出的方式
    
2.数据存储模块mongo.py:
    最终抓取的数据存储到了mongodb中
    
3.抓取模块：
    抓取模块中又包括了请求模块，解析模块和调度模块