Java实现word转html

1.引入maven依赖

 <properties>
    <poi.version>5.2.3</poi.version>
    <xhtml.version>2.0.4</xhtml.version>
</properties>
 
<!--word转html-->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-scratchpad</artifactId>
    <version>${poi.version}</version>
</dependency>
<!--word转html-->
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId>
    <version>${xhtml.version}</version>
</dependency>
<!--处理office文档表格相关 2007+版-->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>${poi.version}</version>
</dependency>
<!--处理office文档表格相关 2003版-->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>${poi.version}</version>
</dependency>复制

2.Java代码

     /**
     * Word2007(docx)格式转html
     * @param filePath 文件路径
     * @return 返回转成String类型的html字符串
     * @throws IOException
     */
    public static String docxToHtml(String filePath) {
        try (ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();
             XWPFDocument docxDocument = new XWPFDocument(Files.newInputStream(Paths.get(filePath)))) {
            XHTMLOptions options = XHTMLOptions.create();
            // 是否忽略未使用的样式
            options.setIgnoreStylesIfUnused(false);
            // 设置片段模式，<div>标签包裹
            options.setFragment(true);
            // 图片转base64
            options.setImageManager(new Base64EmbedImgManager());
            // 转换htm1
            XHTMLConverter.getInstance().convert(docxDocument, htmlStream, options);
            return htmlStream.toString();
        } catch (Exception e) {
            log.error("Word转Html过程出现异常！", e);
        }
        return null;
    }
 
 
    /**
     * Word2003(doc)格式转html
     * @param filePath 文件路径
     * @return 返回转成String类型的html字符串
     * @throws Exception
     */
    public static String docToHtml(String filePath) {
        try (StringWriter writer = new StringWriter();
             HWPFDocument document = new HWPFDocument(Files.newInputStream(new File(filePath).toPath()))) {
            WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
            //将图片转成base64的格式
            wordToHtmlConverter.setPicturesManager((bytes, pictureType, s, v, v1) -> "data:image/png;base64," + Base64.encodeBase64String(bytes));
            wordToHtmlConverter.processDocument(document);
            org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
            DOMSource domSource = new DOMSource(htmlDocument);
            TransformerFactory factory = TransformerFactory.newInstance();
            Transformer serializer = factory.newTransformer();
            serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");
            serializer.setOutputProperty(OutputKeys.METHOD, "html");
            serializer.transform(domSource, new StreamResult(writer));
            return writer.toString();
        } catch (Exception e) {
            log.error("Word转Html过程出现异常！", e);
        }
        return null;
    }
 
    /**
     * word 转 html
     * 自动检测文件格式转换
     * @param filePath 文件本地路径
     * @return 成功返回转换后的html字符串；失败返回null
     */
    public static String autoWord2Html(String filePath) {
        int lastIndexOf = filePath.lastIndexOf(".");
        String suffix = filePath.substring(lastIndexOf + 1);
        if ("doc".equalsIgnoreCase(suffix)) {
            return docToHtml(filePath);
        } else if ("docx".equalsIgnoreCase(suffix)) {
            return docxToHtml(filePath);
        } else {
            log.info("文件格式错误，只支持Docx和Doc格式的文档！");
            return null;
        }
    }复制

标签

word

Java实现word转html

前端提高篇（102）：jQuery高级方法callbacks、deferred

《WEB前端框架开发技术》HTML5响应式旅游景区网站——榆林子州HTML CSS JavaScript (1)

基于Java SpringBoot Vue HTML5药店管理系统(源码 LW 调试文档讲解等)/药店管理软件/药店进销存系统/药店库存管理系统/药店销售系统/药品管理系统/药店收银系统

基于Java SpringBoot Vue HTML5宠物健康顾问系统(源码 LW 调试文档讲解等)/宠物健康/顾问系统/宠物护理/宠物医疗/宠物保健/宠物咨询/宠物医生/宠物健康管理/宠物健康服务

文本，wangEditor5展示HTML无样式，wangEditor5如何看源码，Ctrl U看CSS文件，代码高亮，Prism.js可以实现，解决方法，参考网页源代码的写法

HTML/SSM-实验室预约管理系统-99299（免费领源码开发文档）可做计算机毕业设计JAVA、PHP、爬虫、APP、小程序、C#、C 、python、数据可视化、大数据、全套文案

【简单html静态网页代码】制作一个简单HTML宠物网页（HTML CSS）

模仿电影中黑客电脑界面，html装逼代码

【html】新建一个html并且在浏览器运行

SSM基于html的网上购物系统2nluo 在线充值

前端哥

运行npm error code ENOENTnpm error syscall opennpm error path C:\Users\ultra\Desktop\Vue-Project\pac

前端提高篇（102）：jQuery高级方法callbacks、deferred

解决npm install 报错 “npm err code 1“

【常见错误】npm ERR! code CERT_HAS_EXPIRED & errno CERT_HAS_EXPIRED

vue前端页面弹出红色报错遮罩层 Uncaught runtime errors:at handleError (webpack-internal:///./node_modules/webpack

npm ERR! code CERT_HAS_EXPIRED npm ERR! errno CERT_HAS_EXPIRED npm ERR! request to https://registry.

JQuery中的load()、$

《WEB前端框架开发技术》HTML5响应式旅游景区网站——榆林子州HTML CSS JavaScript (1)

基于Java SpringBoot Vue HTML5药店管理系统(源码 LW 调试文档讲解等)/药店管理软件/药店进销存系统/药店库存管理系统/药店销售系统/药品管理系统/药店收银系统

基于Java SpringBoot Vue HTML5宠物健康顾问系统(源码 LW 调试文档讲解等)/宠物健康/顾问系统/宠物护理/宠物医疗/宠物保健/宠物咨询/宠物医生/宠物健康管理/宠物健康服务

1
【Echarts系列】—— 实现电池图、3D立体圆形柱状图

2024-03-03 11:03:011001

2
CSS常用属性（文本属性）

2024-11-04 09:11:111000

3
TypeScript 中的 Number 类型，Number 类型的特性、常见操作和注意事项

2024-09-30 23:09:061000

4
CSS写代码使页面划分为左右两个区域

2024-09-09 00:09:071000

5
vue使用datav echarts

2024-09-06 00:09:381000

6
使用TweenMax.js和CSS3创建冰球运动员动画效果教程

2024-09-04 23:09:411000

7
使用CDN提高jQuery加载速度

2024-08-24 23:08:211000

8
小兔鲜儿网页首页制作黑马程序员前端基础项目自学笔记

2024-08-19 22:08:161000

9
《Vue》你的弹窗能拖动吗？Vue自定义指令实现可拖动弹窗

2024-08-19 22:08:121000

10
npm的使用

2024-08-18 00:08:131000

	<properties>
	<poi.version>5.2.3</poi.version>
	<xhtml.version>2.0.4</xhtml.version>
	</properties>

	<!--word转html-->
	<dependency>
	<groupId>org.apache.poi</groupId>
	<artifactId>poi-scratchpad</artifactId>
	<version>${poi.version}</version>
	</dependency>
	<!--word转html-->
	<dependency>
	<groupId>fr.opensagres.xdocreport</groupId>
	<artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId>
	<version>${xhtml.version}</version>
	</dependency>
	<!--处理office文档表格相关 2007+版-->
	<dependency>
	<groupId>org.apache.poi</groupId>
	<artifactId>poi-ooxml</artifactId>
	<version>${poi.version}</version>
	</dependency>
	<!--处理office文档表格相关 2003版-->
	<dependency>
	<groupId>org.apache.poi</groupId>
	<artifactId>poi</artifactId>
	<version>${poi.version}</version>
	</dependency>

	/**
	* Word2007(docx)格式转html
	* @param filePath 文件路径
	* @return 返回转成String类型的html字符串
	* @throws IOException
	*/
	public static String docxToHtml(String filePath) {
	try (ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();
	XWPFDocument docxDocument = new XWPFDocument(Files.newInputStream(Paths.get(filePath)))) {
	XHTMLOptions options = XHTMLOptions.create();
	// 是否忽略未使用的样式
	options.setIgnoreStylesIfUnused(false);
	// 设置片段模式，<div>标签包裹
	options.setFragment(true);
	// 图片转base64
	options.setImageManager(new Base64EmbedImgManager());
	// 转换htm1
	XHTMLConverter.getInstance().convert(docxDocument, htmlStream, options);
	return htmlStream.toString();
	} catch (Exception e) {
	log.error("Word转Html过程出现异常！", e);
	}
	return null;
	}


	/**
	* Word2003(doc)格式转html
	* @param filePath 文件路径
	* @return 返回转成String类型的html字符串
	* @throws Exception
	*/
	public static String docToHtml(String filePath) {
	try (StringWriter writer = new StringWriter();
	HWPFDocument document = new HWPFDocument(Files.newInputStream(new File(filePath).toPath()))) {
	WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
	//将图片转成base64的格式
	wordToHtmlConverter.setPicturesManager((bytes, pictureType, s, v, v1) -> "data:image/png;base64," + Base64.encodeBase64String(bytes));
	wordToHtmlConverter.processDocument(document);
	org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
	DOMSource domSource = new DOMSource(htmlDocument);
	TransformerFactory factory = TransformerFactory.newInstance();
	Transformer serializer = factory.newTransformer();
	serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
	serializer.setOutputProperty(OutputKeys.INDENT, "yes");
	serializer.setOutputProperty(OutputKeys.METHOD, "html");
	serializer.transform(domSource, new StreamResult(writer));
	return writer.toString();
	} catch (Exception e) {
	log.error("Word转Html过程出现异常！", e);
	}
	return null;
	}

	/**
	* word 转 html
	* 自动检测文件格式转换
	* @param filePath 文件本地路径
	* @return 成功返回转换后的html字符串；失败返回null
	*/
	public static String autoWord2Html(String filePath) {
	int lastIndexOf = filePath.lastIndexOf(".");
	String suffix = filePath.substring(lastIndexOf + 1);
	if ("doc".equalsIgnoreCase(suffix)) {
	return docToHtml(filePath);
	} else if ("docx".equalsIgnoreCase(suffix)) {
	return docxToHtml(filePath);
	} else {
	log.info("文件格式错误，只支持Docx和Doc格式的文档！");
	return null;
	}
	}

Java实现word转html

微信扫一扫：分享