將 scrapy 連線到 MySQL(Windows 8 專業版 64 位 python 2.7 scrapy v 1.2)

以下示例在使用 python 2.7scrapy v 1.2 的 Windows 8 專業版 64 位作業系統上進行了測試。假設我們已經安裝了 scrapy 框架。 **** ****

我們將在以下教程中使用的 MySQL 資料庫

CREATE TABLE IF NOT EXISTS `scrapy_items` (
  `id` bigint(20) UNSIGNED NOT NULL,
  `quote` varchar(255) NOT NULL,
  `author` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `scrapy_items` (`id`, `quote`, `author`) 
VALUES (1, 'The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.', 'Albert Einstein');

安裝 MySQL 驅動程式

  1. 下載驅動程式 mysql-connector-python-2.2.1.zip OR MySQL-python-1.2.5.zip(MD5)
  2. 將 zip 解壓縮到檔案中,例如 C:\ mysql-connector \
  3. 開啟 cmd 轉到 C:\ mysql-connector ,其中將找到 setup.py 檔案並執行 python setup.py install
  4. 複製並執行以下 example.py
from __future__ import print_function
import mysql.connector
from mysql.connector import errorcode

class `MysqlTest()`:
    table = 'scrapy_items'
    conf = {
        'host': '127.0.0.1',
        'user': 'root',
        'password': '',
        'database': 'test',
        'raise_on_warnings': True
    }
    
    def __init__(self, **kwargs):
        self.cnx = `self.mysql_connect()`
    
    def `mysql_connect(self)`:
        try:
            return mysql.connector.connect(**self.conf)
        except mysql.connector.Error as err:
            if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
                print("Something is wrong with your user name or password")
            elif err.errno == errorcode.ER_BAD_DB_ERROR:
                print("Database does not exist")
            else:
                `print(err)`

    def `select_item(self)`:
        cursor = `self.cnx.cursor()`
        select_query = "SELECT * FROM " + self.table

        `cursor.execute(select_query)`
        for row in `cursor.fetchall()`:
            `print(row)`

        `cursor.close()`
        `self.cnx.close()`

def `main()`:
    mysql = `MysqlTest()`
    `mysql.select_item()`

if __name__ == "__main__" : `main()`

將 Scrapy 連線到 MySQL

首先通過執行以下命令建立一個新的 scrapy 專案

scrapy startproject tutorial

這將建立一個包含以下內容的教程目錄:

https://i.stack.imgur.com/e7SqL.jpg

這是我們第一個蜘蛛的程式碼。將其儲存在專案的 tutorial / spiders 目錄下名為 quotes_spider.py 的檔案中。 ****

我們的第一個蜘蛛

import scrapy
from scrapy.loader import ItemLoader
from tutorial.items import TutorialItem

class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        urls = ['http://quotes.toscrape.com/page/1/']
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        boxes = response.css('div[class="quote"]')
        for box in boxes:
            item = ItemLoader(item=TutorialItem())
            quote = box.css('span[class="text"]::text').extract_first()
            author = box.css('small[class="author"]::text').extract_first()
            item.add_value('quote', quote.encode('ascii', 'ignore'))
            item.add_value('author', author.encode('ascii', 'ignore'))
            yield item.load_item()

Scrapy 專案類

為了定義通用輸出資料格式,Scrapy 提供了 Item 類。 Item 物件是用於收集已刪除資料並指定欄位後設資料的簡單容器。它們提供類似字典的 API,並具有用於宣告其可用欄位的方便語法。詳情請點選我

import scrapy
from scrapy.loader.processors import TakeFirst

class TutorialItem(scrapy.Item):
    # define the fields for your item here like:
    quote = scrapy.Field(output_processor=TakeFirst(),)
    author = scrapy.Field(output_processor=TakeFirst(),)

Scrapy 管道

在一個專案被蜘蛛抓取之後,它被髮送到專案管道,該專案管道通過順序執行的幾個元件處理它,這是我們將已刪除的資料儲存到資料庫中的地方。詳情請點選我

注意 :不要忘記將管道新增到 tutorial / tutorial / settings.py 檔案中的 ITEM_PIPELINES 設定。 ****

from __future__ import print_function
import mysql.connector
from mysql.connector import errorcode

class TutorialPipeline(object):
    table = 'scrapy_items'
    conf = {
        'host': '127.0.0.1',
        'user': 'root',
        'password': '',
        'database': 'sandbox',
        'raise_on_warnings': True
    }
    
    def __init__(self, **kwargs):
        self.cnx = self.mysql_connect()

    def open_spider(self, spider):
        print("spider open")

    def process_item(self, item, spider):
        print("Saving item into db ...")
        self.save(dict(item))
        return item
    
    def close_spider(self, spider):
        self.mysql_close()
    
    def mysql_connect(self):
        try:
            return mysql.connector.connect(**self.conf)
        except mysql.connector.Error as err:
            if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
                print("Something is wrong with your user name or password")
            elif err.errno == errorcode.ER_BAD_DB_ERROR:
                print("Database does not exist")
            else:
                print(err)
    
    
    def save(self, row): 
        cursor = self.cnx.cursor()
        create_query = ("INSERT INTO " + self.table + 
            "(quote, author) "
            "VALUES (%(quote)s, %(author)s)")

        # Insert new row
        cursor.execute(create_query, row)
        lastRecordId = cursor.lastrowid

        # Make sure data is committed to the database
        self.cnx.commit()
        cursor.close()
        print("Item saved with ID: {}" . format(lastRecordId)) 

    def mysql_close(self):
        self.cnx.close()

參考: https//doc.scrapy.org/en/latest/index.html