python爬取图片（thumbURL和html文件标签分别爬取）

当查看源代码，发现网址在thumbURL之后时，用此代码:

 # 当查看源代码，发现网址在thumbURL之后时，用此代码:
 
import requests
 
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Encoding':'gzip, deflate, br',
    'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'
}
 
url = input("请输入你想保存的图片的网址：")
response = requests.get(url, headers = headers)
print(response)
print(response.status_code)
 
file = input("请输入你想图片保存在的文件夹名称：")
 
import os
os.makedirs(f'./{file}', exist_ok = True)
# 新建目录，用于存储图片
# def makedirs(name, mode=0o777, exist_ok=False):
# 参数说明：
#     name：用于指定要创建目录的路径。
#     mode：指定目录的模式，默认模式为八进制的 777。类似于 chmod() 方法。
#     exist_ok：可选参数，如果值为 False，当要创建的目录已经存在时，抛出 FileExistsError 异常；如果值为True，
#         当要创建的目录已经存在时，不会抛出异常。默认值为 False。
 
import re
html = response.text
image_url_list = re.findall('"thumbURL":"(.*?)",', html, re.S)
# 用于查找得到thumbURL后面的图片网址，目前还不会正则表达式
 
# print(image_url_list)
q = 0
for url in image_url_list:
   # print(url)
    res=requests.get(url)
    picture=res.content
    q+=1
    with open(f'{file}\\{q}.jpg',mode='wb') as f:
        f.write(picture)
    # 在小猫文件夹下保存图片，以q为图片文件名
    复制

# 当用requests.get请求得到的源代码是html文件，每一行是一个标签时，可以用此代码

 # 当用requests.get请求得到的源代码是html文件，每一行是一个标签时，可以用此代码
import requests
from bs4 import BeautifulSoup
 
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Encoding':'gzip, deflate, br',
    'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'
}
 
url = input("请输入你想保存的图片的网址：")
response = requests.get(url, headers = headers)
print(response)
print(response.status_code)
 
file = input("请输入你想图片保存在的文件夹名称：")
# response=requests.get('https://www.umei.cc/meinvtupian/')
response.encoding='utf-8'
# print(response.text)
soup=BeautifulSoup(response.text,'html.parser')
# print(soup)
  
import os
os.makedirs(f'./图片/{file}', exist_ok=True)
 
lis = soup.find_all('div',class_="taotu-main")
# print(a)
print("*********")
 
q=0
 
t = 0
for l in lis:
    if(t == 0):
        print(l)
    t += 1
    p=l.find_all('img')
    for i in p:
        pic=i.get('data-original')
        print(pic)
        res=requests.get(pic)
        picture=res.content
        q+=1
        with open(f'图片/{file}\\{q}.jpg',mode='wb') as f:
            f.write(picture)复制

标签

python爬取图片（thumbURL和html文件标签分别爬取）

python 解读JSON文件，一文搞懂！

由于不同电脑语言具有不同的特性和用途，我会为你提供一个简化版的游戏商城的概念代码，分别使用 Python（用于后端逻辑）和 HTML/JavaScript（用于前端展示）。

Python数据可视化案例——折线图

（开题）flask框架基于HTML5的酒店预订管理系统（程序论文 python）

python爬虫入门（三）之HTML网页结构

Python毕业设计选题：基于django vue的荣誉证书管理系统

java 判断一个字符串是否是能转成JSON串

Java json转换实体类(JavaBean)，实体类(JavaBean)转换json

使用 jQuery 向＜select＞添加选项？

python uniapp共享单车小程序java nodejs-计算机毕业设计

前端哥

C#解析JSON的常用库--Newtonsoft.Json

jsonfield 项目常见问题解决方案

【SpringMVC】_SpringMVC项目返回HTML与JSON

BugJson因为json格式问题OOM怎么办

python 解读JSON文件，一文搞懂！

Redisson同时使用jackson、fastjson、kryo、protostuff序列化（含效率对比）

开源项目“Pretty JSON”安装与配置完全指南

2024年前端最新Nodejs基础之包管理工具npm(二)(2)，微软面试题及答案

解决全局安装pnpm后无法使用的问题

安装Nodejs后，npm无法使用

1
【Echarts系列】—— 实现电池图、3D立体圆形柱状图

2024-03-03 11:03:011001

2
CSS常用属性（文本属性）

2024-11-04 09:11:111000

3
TypeScript 中的 Number 类型，Number 类型的特性、常见操作和注意事项

2024-09-30 23:09:061000

4
CSS写代码使页面划分为左右两个区域

2024-09-09 00:09:071000

5
vue使用datav echarts

2024-09-06 00:09:381000

6
使用TweenMax.js和CSS3创建冰球运动员动画效果教程

2024-09-04 23:09:411000

7
使用CDN提高jQuery加载速度

2024-08-24 23:08:211000

8
小兔鲜儿网页首页制作黑马程序员前端基础项目自学笔记

2024-08-19 22:08:161000

9
《Vue》你的弹窗能拖动吗？Vue自定义指令实现可拖动弹窗

2024-08-19 22:08:121000

10
npm的使用

2024-08-18 00:08:131000

	# 当查看源代码，发现网址在thumbURL之后时，用此代码:

	import requests

	headers = {
	'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
	'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,/;q=0.8',
	'Accept-Encoding':'gzip, deflate, br',
	'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'
	}

	url = input("请输入你想保存的图片的网址：")
	response = requests.get(url, headers = headers)
	print(response)
	print(response.status_code)

	file = input("请输入你想图片保存在的文件夹名称：")

	import os
	os.makedirs(f'./{file}', exist_ok = True)
	# 新建目录，用于存储图片
	# def makedirs(name, mode=0o777, exist_ok=False):
	# 参数说明：
	# name：用于指定要创建目录的路径。
	# mode：指定目录的模式，默认模式为八进制的 777。类似于 chmod() 方法。
	# exist_ok：可选参数，如果值为 False，当要创建的目录已经存在时，抛出 FileExistsError 异常；如果值为True，
	# 当要创建的目录已经存在时，不会抛出异常。默认值为 False。

	import re
	html = response.text
	image_url_list = re.findall('"thumbURL":"(.*?)",', html, re.S)
	# 用于查找得到thumbURL后面的图片网址，目前还不会正则表达式

	# print(image_url_list)
	q = 0
	for url in image_url_list:
	# print(url)
	res=requests.get(url)
	picture=res.content
	q+=1
	with open(f'{file}\\{q}.jpg',mode='wb') as f:
	f.write(picture)
	# 在小猫文件夹下保存图片，以q为图片文件名

	# 当用requests.get请求得到的源代码是html文件，每一行是一个标签时，可以用此代码
	import requests
	from bs4 import BeautifulSoup

	headers = {
	'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
	'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,/;q=0.8',
	'Accept-Encoding':'gzip, deflate, br',
	'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'
	}

	url = input("请输入你想保存的图片的网址：")
	response = requests.get(url, headers = headers)
	print(response)
	print(response.status_code)

	file = input("请输入你想图片保存在的文件夹名称：")
	# response=requests.get('https://www.umei.cc/meinvtupian/')
	response.encoding='utf-8'
	# print(response.text)
	soup=BeautifulSoup(response.text,'html.parser')
	# print(soup)

	import os
	os.makedirs(f'./图片/{file}', exist_ok=True)

	lis = soup.find_all('div',class_="taotu-main")
	# print(a)
	print("*********")

	q=0

	t = 0
	for l in lis:
	if(t == 0):
	print(l)
	t += 1
	p=l.find_all('img')
	for i in p:
	pic=i.get('data-original')
	print(pic)
	res=requests.get(pic)
	picture=res.content
	q+=1
	with open(f'图片/{file}\\{q}.jpg',mode='wb') as f:
	f.write(picture)

python爬取图片（thumbURL和html文件标签分别爬取）

微信扫一扫：分享