# py-sec.gov **Repository Path**: Tony36051/py-sec.gov ## Basic Information - **Project Name**: py-sec.gov - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-06-07 - **Last Updated**: 2025-08-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: 归档, 美国女博士 ## README # Scarpy Project for sec.gov scrawl https://www.sec.gov/edgar/search/# with scrapy spider ## Installation 1. install python3 2. open command window: press `win+r`, type `cmd` and press `Enter` key 3. install package: type `pip install scrapy` and press `Enter` key ## How to run Firstly, you need to navigate to `scrapy project folder` and open Command Prompt window. You can follow this: >Open File Explorer and navigate to the folder you need to open the Command Prompt. Click inside the location bar, and type ‘cmd’, without the quote marks. Tap the enter key and a Command Prompt window will open in that location. Secondly, prepare your `original.csv` as input, providing cik. Put `original.csv` into the sub-project directory(`def14a`/`payratio`/`scrapy_10k`) ### run spider for def14a >scrapy runspider -O result.csv def14a\spiders\Def14aURLSpider.py ### run spider for payratio >scrapy runspider -O result.csv payratio\spiders\PayRatioSpider.py ### run spider for scrapy_10k >scrapy runspider -O result.csv scrapy_10k\spiders\spider10k.py ## Code structure Take `scrapy_10k` as example ```text \---scrapy_10k | Firms-10-K.csv | Firms-10-K_manual_one.csv | main_10k.py | res.csv | result_10k.xlsx | scrapy.cfg | \---scrapy_10k | items.py | middlewares.py | pipelines.py | settings.py | __init__.py | \---spiders spider10k.py
__init__.py ``` Entry file: `main_10k.py` `spider10k.py` Main process logic in `main_10k.py` input/output files in ## Output File Output file is in `.csv` format, and the coder copy and paste data in `.xlsx` file by hand. Column `click me` is written by excel function `HYPERLINK`.