首页 前端知识 node实现使用HTML生成PDF和图片

node实现使用HTML生成PDF和图片

2024-03-15 10:03:47 前端知识 前端哥 909 886 我要收藏

前段时间接到一个任务,是将网页生成PDF。

开始的解决方案是使用html2canvas将页面生成图片,再使用jspdf生成PDF

这样做有3个缺点:

  1. html2canvas生成的图片比较模糊
  2. html2canvasjspdf非常吃CPU,会导致性能差一点的电脑卡死
  3. 生成的PDF,因为内容是图片生成的,所以文字等内容不能选中、复制等操作

后面了解到Puppeteer库,Puppeteer是一个node库,他提供了一组用来操纵Chrome的API, 通俗来说就是一个 headless chrome浏览器。既然是浏览器,那么我们手工可以在浏览器上做的事情Puppeteer都能胜任:

  • 生成网页截图或者 PDFVUE这类的SPA也可以生成网页截图或者 PDF
  • 高级爬虫,可以爬取大量异步渲染内容的网页
  • 模拟键盘输入、表单自动提交、登录网页等,实现 UI 自动化测试
  • 捕获站点的时间线,以便追踪你的网站,帮助分析网站性能问题

我们使用midwayjs作为HTTP服务,通过调用Puppeteer生成图片和PDF返回二进制给浏览器。

搭建环境

  • 创建项目,node版本需要18.0.0以上

    npm init midway@latest -y
    cd G:\workspaces\export-service
    
  • 安装puppeteer-core

    npm i puppeteer-core -S
    npm i carlo -S
    

    安装puppeteer-corecarlo需要代理,npm设置代理的方法

    # 设置 HTTP 代理
    npm config set proxy http://127.0.0.1:7890
    
    # 设置 HTTPS 代理
    npm config set https-proxy http://127.0.0.1:7890
    
    # 移除代理设置
    npm config delete proxy
    npm config delete https-proxy
    
    # 查看当前代理设置
    npm config get proxy
    npm config get https-proxy
    

导出pdf和图片

  • 创建puppeteer.service.ts

    import { Provide } from '@midwayjs/core';
    import { IImageOptions } from '../interface';
    import puppeteer from 'puppeteer-core';
    import * as findChrome from 'carlo/lib/find_chrome';
    
    @Provide()
    export class PuppeteerService {
      async getImage(data: IImageOptions) {
        // 创建一个 puppeteer 实例
        const findChromePath = await findChrome({});
        const executablePath = findChromePath.executablePath;
        const browser = await puppeteer.launch({
          args: [
            // Required for Docker version of Puppeteer
            '--no-sandbox',
            '--disable-setuid-sandbox',
            // This will write shared memory files into /tmp instead of /dev/shm,
            // because Docker’s default for /dev/shm is 64MB
            '--disable-dev-shm-usage',
          ],
          headless: true,
          executablePath,
        });
        const page = await browser.newPage();
        if (data.cookies) {
          await page.setCookie(...data.cookies);
        }
        await page.goto(data.url);
        const buffer = await page.screenshot({ fullPage: true, type: 'jpeg' });
        await browser.close();
        return buffer;
      }
    
      async getPdf(data: IImageOptions) {
        // 创建一个 puppeteer 实例
        const findChromePath = await findChrome({});
        const executablePath = findChromePath.executablePath;
        const browser = await puppeteer.launch({
          args: [
            // Required for Docker version of Puppeteer
            '--no-sandbox',
            '--disable-setuid-sandbox',
            // This will write shared memory files into /tmp instead of /dev/shm,
            // because Docker’s default for /dev/shm is 64MB
            '--disable-dev-shm-usage',
          ],
          headless: true,
          executablePath,
        });
        const page = await browser.newPage();
        if (data.cookies) {
          await page.setCookie(...data.cookies);
        }
        await page.goto(data.url);
        const buffer = await page.pdf({
          printBackground: true,
          margin: {
            top: 20,
            bottom: 20,
          },
        });
        await browser.close();
        return buffer;
      }
    }
    
  • 修改interface.ts

    export interface IImageOptions {
      url: string;
      cookies?: {
        name: string;
        value: string;
        path?: string;
        domain?: string;
      }[];
    }
    
    
  • 创建api.controller.ts

    import { Inject, Controller, Get } from '@midwayjs/core';
    import { Context } from '@midwayjs/koa';
    import { PuppeteerService } from '../service/puppeteer.service';
    
    @Controller('/')
    export class APIController {
      @Inject()
      ctx: Context;
    
      @Inject()
      puppeteerService: PuppeteerService;
    
      @Get('/img')
      async getImg() {
        const buffer = await this.puppeteerService.getImage({
          url: 'https://www.baidu.com',
        });
        this.ctx.type = 'image/jpeg';
        this.ctx.set('Accept', 'image/webp,image/apng,image/png,image/*,*/*;q=0.8');
        // 下载图片
        // this.ctx.set('content-disposition', 'attachment; filename="baidu.png"');
        this.ctx.body = buffer;
      }
    
      @Get('/pdf')
      async getPdf() {
        const buffer = await this.puppeteerService.getPdf({
          url: 'https://www.baidu.com',
        });
        this.ctx.type = '.pdf';
        // this.ctx.set('Content-Type', 'application/octet-stream');
        this.ctx.set('Accept', 'image/webp,image/apng,image/png,image/*,*/*;q=0.8');
        // 下载PDF
        // this.ctx.set('content-disposition', 'attachment; filename="baidu.png"');
        this.ctx.body = buffer;
      }
    }
    
  • 文件架构

    在这里插入图片描述

  • 启动

    npm run dev
    
  • 测试

    • 导出图片

      在这里插入图片描述

    • 导出PDF

    在这里插入图片描述

问题

使用jmeter做压测发现内存暴了,导致服务重启。

原因:每一次请求都去产生一个puppeteer实例。产生一个 puppeteer实例就等于打开一个chrome,这是一个非常消耗性能的行为。

优化

使用连接池generic-pool优化。

  • 安装generic-pool

    npm i generic-pool -S
    
  • 创建puppeteer-pool.ts

    import puppeteer, { Browser, BrowserContext } from 'puppeteer-core';
    import { createPool, Pool } from 'generic-pool';
    import * as findChrome from 'carlo/lib/find_chrome';
    
    interface IPuppeteerPool {
      max?: number;
      min?: number;
      maxUses?: number;
      testOnBorrow?: boolean;
      autostart?: boolean;
      idleTimeoutMillis?: number;
      evictionRunIntervalMillis?: number;
      puppeteerArgs?: number;
      validator?: () => Promise<boolean>;
    }
    export class PuppeteerPool {
      private static _instance: PuppeteerPool;
      private _options: IPuppeteerPool;
      private _useCount = 0;
      private _browser: Browser;
      private _pool: Pool<BrowserContext>;
    
      public static async getInstance(options: IPuppeteerPool = {}) {
        if (!this._instance) {
          this._instance = new PuppeteerPool(options);
          await this._instance.init();
        }
        return this._instance;
      }
    
      /**
       * 初始化一个 Puppeteer 池
       * @param {Object} [options={}] 创建池的配置配置
       * @param {Number} [options.max=10] 最多产生多少个 puppeteer 实例 。如果你设置它,请确保 在引用关闭时调用清理池。 pool.drain().then(()=>pool.clear())
       * @param {Number} [options.min=1] 保证池中最少有多少个实例存活
       * @param {Number} [options.maxUses=2048] 每一个 实例 最大可重用次数,超过后将重启实例。0表示不检验
       * @param {Number} [options.testOnBorrow=2048] 在将 实例 提供给用户之前,池应该验证这些实例。
       * @param {Boolean} [options.autostart=false] 是不是需要在 池 初始化时 初始化 实例
       * @param {Number} [options.idleTimeoutMillis=3600000] 如果一个实例 60分钟 都没访问就关掉他
       * @param {Number} [options.evictionRunIntervalMillis=180000] 每 3分钟 检查一次 实例的访问状态
       * @param {Object} [options.puppeteerArgs={}] puppeteer.launch 启动的参数
       * @param {Function} [options.validator=(instance)=>Promise.resolve(true))] 用户自定义校验 参数是 取到的一个实例
       * @param {Object} [options.otherConfig={}] 剩余的其他参数 // For all opts, see opts at https://github.com/coopernurse/node-pool#createpool
       */
      constructor(options: IPuppeteerPool = {}) {
        this._options = options;
      }
    
      public async init() {
        await this._initBrowser();
        this._initPool();
      }
    
      private async _initBrowser() {
        // 创建一个 puppeteer 实例
        const findChromePath = await findChrome({});
        const executablePath = findChromePath.executablePath;
        this._browser = await puppeteer.launch({
          args: [
            // Required for Docker version of Puppeteer
            '--no-sandbox',
            '--disable-setuid-sandbox',
            // This will write shared memory files into /tmp instead of /dev/shm,
            // because Docker’s default for /dev/shm is 64MB
            '--disable-dev-shm-usage',
          ],
          headless: true,
          executablePath,
        });
      }
    
      private _initPool() {
        const {
          max = 10,
          min = 2,
          maxUses = 2028,
          testOnBorrow = true,
          autostart = false,
          idleTimeoutMillis = 3600000,
          evictionRunIntervalMillis = 180000,
          validator = (instance: BrowserContext) => Promise.resolve(true),
          ...otherConfig
        } = this._options;
    
        const factory = {
          create: async () => {
            // 创建一个匿名的浏览器上下文
            const instance = this._browser;
            // 创建一个 puppeteer 实例 ,并且初始化使用次数为 0
            this._useCount = 0;
            return await instance.createIncognitoBrowserContext();
          },
          destroy: async (instance: BrowserContext) => {
            await instance.close();
          },
          validate: async (instance: BrowserContext) => {
            // 执行一次自定义校验,并且校验校验 实例已使用次数。 当 返回 reject 时 表示实例不可用
            const valid = await validator(instance);
            return valid && (maxUses <= 0 || this._useCount < maxUses);
          },
        };
        const config = {
          max,
          min,
          testOnBorrow,
          autostart,
          idleTimeoutMillis,
          evictionRunIntervalMillis,
          ...otherConfig,
        };
        this._pool = createPool(factory, config);
        const genericAcquire = this._pool.acquire.bind(this._pool);
        // 重写了原有池的消费实例的方法。添加一个实例使用次数的增加
        this._pool.acquire = () =>
          genericAcquire().then((instance: BrowserContext) => {
            this._useCount += 1;
            return instance;
          });
      }
    
      public async use(fn: (instance: BrowserContext) => Promise<BrowserContext>) {
        let resource: BrowserContext;
        return this._pool
          .acquire()
          .then(async r => {
            resource = r;
            return resource;
          })
          .then(fn)
          .then(
            result => {
              // 不管业务方使用实例成功与后都表示一下实例消费完成
              this._pool.release(resource);
              return result;
            },
            err => {
              this._pool.release(resource);
              throw err;
            }
          );
      }
    
      get pool(): Pool<BrowserContext> {
        return this._pool;
      }
    }
    
    
  • 修改puppeteer.service.ts

    import { Provide } from '@midwayjs/core';
    import { IImageOptions } from '../interface';
    import { PuppeteerPool } from '../util/puppeteer-pool';
    
    @Provide()
    export class PuppeteerService {
      async getImage(data: IImageOptions) {
        const pool = await PuppeteerPool.getInstance();
        return pool.use(async instance => {
          const page = await instance.newPage();
          if (data.cookies) {
            await page.setCookie(...data.cookies);
          }
          await page.goto(data.url);
          const buffer = await page.screenshot({ fullPage: true, type: 'jpeg' });
          await page.close();
          return buffer;
        });
      }
    
      async getPdf(data: IImageOptions) {
        const pool = await PuppeteerPool.getInstance();
        return pool.use(async instance => {
          const page = await instance.newPage();
          if (data.cookies) {
            await page.setCookie(...data.cookies);
          }
          await page.goto(data.url);
          const buffer = await page.pdf({
            printBackground: true,
            margin: {
              top: 20,
              bottom: 20,
            },
          });
          await page.close();
          return buffer;
        });
      }
    }
    

部署

这里使用docker进行部署,部署前需要下载chrome浏览器下载。

下载后将文件放在项目的build/google-chrome-stable_current_x86_64.rpm

  • 在项目的根目录下创建docker配置文件Dockerfile

    FROM node:18 AS build
    
    RUN npm config set https-proxy http://192.168.56.1:7890
    
    WORKDIR /app
    
    COPY ./package.json /app
    COPY ./package-lock.json /app
    COPY ./tsconfig.json /app
    COPY ./.editorconfig /app
    COPY ./.eslintrc.json /app
    COPY ./.prettierrc.js /app
    COPY ./bootstrap.js /app
    COPY ./src /app/src
    
    RUN npm install
    
    RUN npm run build
    
    FROM node:18 AS chrome-stable
    
    WORKDIR /app
    
    COPY ./build/google-chrome-stable_current_amd64.deb /app/google-chrome-stable_current_amd64.deb
    
    RUN apt-get update && apt-get install -y fonts-liberation libasound2 fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-khmeros fonts-kacst fonts-freefont-ttf \
        libatk-bridge2.0-0 libgtk-3-0 libnspr4 libnss3 libx11-xcb1 libxss1 libxtst6 lsb-release xdg-utils libu2f-udev libvulkan1
    RUN dpkg -i /app/google-chrome-stable_current_amd64.deb
    
    RUN rm -rf /var/lib/apt/lists/*
    
    FROM chrome-stable
    
    RUN npm config set https-proxy http://192.168.56.1:7890
    
    WORKDIR /app
    
    COPY --from=build /app/dist ./dist
    # 把源代码复制过去, 以便报错能报对行
    COPY --from=build /app/src ./src
    COPY --from=build /app/bootstrap.js ./
    COPY --from=build /app/package.json ./
    COPY --from=build /app/package-lock.json ./
    
    ENV TZ="Asia/Shanghai"
    
    RUN npm install --production
    
    # 如果端口更换,这边可以更新一下
    EXPOSE 7001
    
    CMD ["npm", "run", "start"]
    
  • 创建docker-compose配置文件docker-compose/docker-compose.yml

    version: '3'
    
    services:
      export-service:
        build:
          context: ..
          dockerfile: Dockerfile
        image: export-service:latest
        container_name: export-service
        restart: always
        ports:
          - 7001:7001
    
  • 启动命令

    cd docker-compose
    sudo docker-compose up -d 
    

    其他命令

    # 不使用缓存,打包镜像
    sudo docker-compose build --no-cache
    

在这里插入图片描述

github:

如果觉得对您有帮助,还烦请点击下面的链接,帮忙github点个star~谢谢~

Imprevia/export-service

转载请注明出处或者链接地址:https://www.qianduange.cn//article/3806.html
标签
pdf
评论
会员中心 联系我 留言建议 回顶部
复制成功!