请大家帮我看下为什么这个代码爬不到数据啊?
# -*- coding: utf-8 -*-
import time
import re
import json
from start.loan import Loan
from scrapy.spider import Spider
from scrapy.selector import Selector
from start.items import LoanItem
class hepaiSpider(Spider):
name = "hepai"
allowed_domains = ["he-pai.cn"]
start_urls = [
"http://www.he-pai.cn/investmentDetail/investmentDetails/index.do"
]
def parse(self, response):
sel = Selector(response)
items = []
loan =Loan()
sites = sel.xpath("//div[@class='tabs_con']")
for site in sites:
item = LoanItem()
item['company_name'] = '合拍在线'
item['title'] = site.xpath("/table[@class='tzlb']/a/text()").extract()
item['rate'] = site.xpath("//table/tbody/tr/td[2]/text()").extract()
item['amount'] = site.xpath("//table/tboday/td[3]/text()").extract()
item['public_time'] = site.xpath("//table/tbody/td[4]/text()").extract()
item['process'] = site.xpath("//table/tbody/td[5]/div/div[1]/@style").extract()
item['period'] = site.xpath("//table/tbody/td[6]/text()").extract()
items.append(item)
return items;
---------------------------------------分割线-------------------------------------------------------------
crawl以后是这样子
6 14:00:30+0800 [hepai] DEBUG: Scraped from <200 http://www.he-pai.cn/investmentDetail/investmentDetails/index.do>
{'amount': [],
'company_name': '\xe5\x90\x88\xe6\x8b\x8d\xe5\x9c\xa8\xe7\xba\xbf',
'period': [],
'process': [],
'public_time': [],
'rate': [],
'title': []}
关于爬虫爬不到数据
答案:2 悬赏:30 手机版
解决时间 2021-04-09 04:51
- 提问者网友:不爱我么
- 2021-04-08 14:51
最佳答案
- 五星知识达人网友:蕴藏春秋
- 2021-04-08 15:56
你这个site.xpath("/table[@class='tzlb']/a/text()").extract()是不是写错了啊
全部回答
- 1楼网友:酒安江南
- 2021-04-08 16:14
被引擎k过的网站,爬虫是不会去爬的,因为已经进了黑名单
还有一个就是还没被蜘蛛发现未被收录的网站,也是爬不到的
我要举报
如以上回答内容为低俗、色情、不良、暴力、侵权、涉及违法等信息,可以点下面链接进行举报!
点此我要举报以上问答信息
大家都在看
推荐资讯