將 scrapy 連線到 MySQL(Windows 8 專業版 64 位 python 2.7 scrapy v 1.2)
以下示例在使用 python 2.7 和 scrapy v 1.2 的 Windows 8 專業版 64 位作業系統上進行了測試。假設我們已經安裝了 scrapy 框架。 **** ****
我們將在以下教程中使用的 MySQL 資料庫
CREATE TABLE IF NOT EXISTS `scrapy_items` (
`id` bigint(20) UNSIGNED NOT NULL,
`quote` varchar(255) NOT NULL,
`author` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `scrapy_items` (`id`, `quote`, `author`)
VALUES (1, 'The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.', 'Albert Einstein');
安裝 MySQL 驅動程式
- 下載驅動程式 mysql-connector-python-2.2.1.zip OR MySQL-python-
1.2.5.zip(MD5)
- 將 zip 解壓縮到檔案中,例如 C:\ mysql-connector \
- 開啟 cmd 轉到 C:\ mysql-connector ,其中將找到 setup.py 檔案並執行 python setup.py install
- 複製並執行以下 example.py
from __future__ import print_function import mysql.connector from mysql.connector import errorcode class `MysqlTest()`: table = 'scrapy_items' conf = { 'host': '127.0.0.1', 'user': 'root', 'password': '', 'database': 'test', 'raise_on_warnings': True } def __init__(self, **kwargs): self.cnx = `self.mysql_connect()` def `mysql_connect(self)`: try: return mysql.connector.connect(**self.conf) except mysql.connector.Error as err: if err.errno == errorcode.ER_ACCESS_DENIED_ERROR: print("Something is wrong with your user name or password") elif err.errno == errorcode.ER_BAD_DB_ERROR: print("Database does not exist") else: `print(err)` def `select_item(self)`: cursor = `self.cnx.cursor()` select_query = "SELECT * FROM " + self.table `cursor.execute(select_query)` for row in `cursor.fetchall()`: `print(row)` `cursor.close()` `self.cnx.close()` def `main()`: mysql = `MysqlTest()` `mysql.select_item()` if __name__ == "__main__" : `main()`
將 Scrapy 連線到 MySQL
首先通過執行以下命令建立一個新的 scrapy 專案
scrapy startproject tutorial
這將建立一個包含以下內容的教程目錄:
https://i.stack.imgur.com/e7SqL.jpg
這是我們第一個蜘蛛的程式碼。將其儲存在專案的 tutorial / spiders 目錄下名為 quotes_spider.py 的檔案中。 ****
我們的第一個蜘蛛
import scrapy
from scrapy.loader import ItemLoader
from tutorial.items import TutorialItem
class QuotesSpider(scrapy.Spider):
name = "quotes"
def start_requests(self):
urls = ['http://quotes.toscrape.com/page/1/']
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
boxes = response.css('div[class="quote"]')
for box in boxes:
item = ItemLoader(item=TutorialItem())
quote = box.css('span[class="text"]::text').extract_first()
author = box.css('small[class="author"]::text').extract_first()
item.add_value('quote', quote.encode('ascii', 'ignore'))
item.add_value('author', author.encode('ascii', 'ignore'))
yield item.load_item()
Scrapy 專案類
為了定義通用輸出資料格式,Scrapy 提供了 Item 類。 Item 物件是用於收集已刪除資料並指定欄位後設資料的簡單容器。它們提供類似字典的 API,並具有用於宣告其可用欄位的方便語法。詳情請點選我
import scrapy
from scrapy.loader.processors import TakeFirst
class TutorialItem(scrapy.Item):
# define the fields for your item here like:
quote = scrapy.Field(output_processor=TakeFirst(),)
author = scrapy.Field(output_processor=TakeFirst(),)
Scrapy 管道
在一個專案被蜘蛛抓取之後,它被髮送到專案管道,該專案管道通過順序執行的幾個元件處理它,這是我們將已刪除的資料儲存到資料庫中的地方。詳情請點選我
注意 :不要忘記將管道新增到 tutorial / tutorial / settings.py 檔案中的 ITEM_PIPELINES 設定。 ****
from __future__ import print_function
import mysql.connector
from mysql.connector import errorcode
class TutorialPipeline(object):
table = 'scrapy_items'
conf = {
'host': '127.0.0.1',
'user': 'root',
'password': '',
'database': 'sandbox',
'raise_on_warnings': True
}
def __init__(self, **kwargs):
self.cnx = self.mysql_connect()
def open_spider(self, spider):
print("spider open")
def process_item(self, item, spider):
print("Saving item into db ...")
self.save(dict(item))
return item
def close_spider(self, spider):
self.mysql_close()
def mysql_connect(self):
try:
return mysql.connector.connect(**self.conf)
except mysql.connector.Error as err:
if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
print("Something is wrong with your user name or password")
elif err.errno == errorcode.ER_BAD_DB_ERROR:
print("Database does not exist")
else:
print(err)
def save(self, row):
cursor = self.cnx.cursor()
create_query = ("INSERT INTO " + self.table +
"(quote, author) "
"VALUES (%(quote)s, %(author)s)")
# Insert new row
cursor.execute(create_query, row)
lastRecordId = cursor.lastrowid
# Make sure data is committed to the database
self.cnx.commit()
cursor.close()
print("Item saved with ID: {}" . format(lastRecordId))
def mysql_close(self):
self.cnx.close()