python运用beutifulsoup来爬虫的根本套路ITeye - 超凡娱乐

python运用beutifulsoup来爬虫的根本套路ITeye

2019-01-10 14:22:56 | 作者: 宣朗 | 标签: 然后,设置,查看 | 浏览: 1756

headers = { User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36 def get_info(url): wb_data = requests.get(url,headers=headers) soup = BeautifulSoup(wb_data.text,lxml) ranks = soup.select(span.pc_temp_num) titles = soup.select(div.pc_temp_songlist ul li a) times = soup.select(span.pc_temp_tips_r span) for rank,title,time in zip(ranks,titles,times): data = { rank:rank.get_text().strip(), singer:title.get_text().split(-)[0], song:title.get_text().split(-)[0], time:time.get_text().strip() print(data) if __name__ == __main__: urls = [http://www.kugou.com/yy/rank/home/{}-8888.html.format(str(i)) for i in range(1,2)] for url in urls: get_info(url) time.sleep(5)

  在上面的代码中 from bs4 import BeautifulSoup首要导入;
然后设置headers,
然后  soup = BeautifulSoup(wb_data.text,lxml) 中,调用BeautifulSoup,
设置lxml解析器;
然后在
ranks = soup.select(span.pc_temp_num)
  titles = soup.select(div.pc_temp_songlist ul li a)
这些,XPATH用CHROME浏览器的查看功用,查看下就可以了;
然后一个循环,把数据打印出来,留意其顶用strip去掉空格;
然后
urls = [http://www.kugou.com/yy/rank/home/{}-8888.html.format(str(i)) for i in range(1,2)]
是python中很有特征的语法,设置一个URL的模板,其间{}便是要用format中的内容去替换的;
版权声明
本文来源于网络,版权归原作者所有,其内容与观点不代表超凡娱乐立场。转载文章仅为传播更有价值的信息,如采编人员采编有误或者版权原因,请与我们联系,我们核实后立即修改或删除。

猜您喜欢的文章