github地址
https://github.com/paperless-ngx/paperless-ngx
docker images地址
https://hub.docker.com/r/paperlessngx/paperless-ngx
社区支持的无纸化增强版本:扫描、索引和归档所有纸质文档
特点
-
使用标签、通讯员、类型等组织和索引扫描的文档。
-
对文档执行 OCR,将可选择的文本添加到仅图像文档中,并向文档添加标签、通讯员和文档类型。
-
支持 PDF 文档、图像、纯文本文件和 Office 文档(Word、Excel、Powerpoint 和 LibreOffice 等效项)。
-
Office 文档支持是可选的,由 Apache Tika 提供(请参阅配置)
-
无纸化将文档直接存储在磁盘上。文件名和文件夹采用无纸化管理,格式可自由配置。
-
单页应用程序前端。
-
包括一个显示基本统计数据并具有文档上传功能的仪表板。
-
按标签、通讯员、类型等进行过滤。
-
可以保存自定义视图并将其显示在仪表板上。
-
全文搜索可帮助找到所需内容。
-
自动完成会建议文档中的相关单词。
-
结果按与的搜索查询的相关性排序。
-
突出显示可以显示文档的哪些部分与查询匹配。
-
搜索类似文档(“更多类似内容”)
-
电子邮件处理:无纸化添加电子邮件帐户的文档。
-
配置多个帐户并为每个帐户配置过滤器。
-
从邮件添加文档时,无纸化可以将这些邮件移动到新文件夹、将其标记为已读、将其标记为重要或将其删除。
-
机器学习驱动的文档匹配。
-
Paperless-ngx 会从文档中学习,一旦以无纸化方式存储了一些文档,它就能够自动为文档分配标签、通讯员和类型。
-
针对多核系统进行了优化:Paperless-ngx 并行使用多个文档。
-
集成的完整性检查器可确保文档存档状况良好。
简而言之,就是纸质文档电子化
第一个想到的是各种票据,合同等,其次是各类术说明书,然后是杂志,书籍等
官方文档
https://docs.paperless-ngx.com/setup/#docker_script
docker-compose参考
https://github.com/paperless-ngx/paperless-ngx/tree/main/docker/compose
新建docker-compose.yml配置文件
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- ./redisdata:/data
db:
image: docker.io/library/postgres:15
restart: unless-stopped
volumes:
- ./pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- "3033:8000"
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- ./data:/usr/src/paperless/data
- ./media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_ADMIN_USER: admin
PAPERLESS_ADMIN_PASSWORD: admin
volumes:
data:
media:
pgdata:
redisdata:
新建docker-compose.env文件
直接复制对应的内容
# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
#USERMAP_UID=1000
#USERMAP_GID=1000
# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
#PAPERLESS_OCR_LANGUAGES=tur ces
###############################################################################
# Paperless-specific settings #
###############################################################################
# All settings defined in the paperless.conf.example can be used here. The
# Docker setup does not use the configuration file.
# A few commonly adjusted settings are provided below.
# This is required if you will be exposing Paperless-ngx on a public domain
# (if doing so please consider security measures such as reverse proxy)
#PAPERLESS_URL=https://paperless.example.com
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY=change-me
# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
#PAPERLESS_TIME_ZONE=America/Los_Angeles
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
#PAPERLESS_OCR_LANGUAGE=eng
# Set if accessing paperless via a domain subpath e.g. https://domain.com/PATHPREFIX and using a reverse-proxy like traefik or nginx
#PAPERLESS_FORCE_SCRIPT_NAME=/PATHPREFIX
#PAPERLESS_STATIC_URL=/PATHPREFIX/static/ # trailing slash required
开放端口
sudo ufw allow 3033
拉取镜像并启动服务
docker-compose up
参考文档
https://docs.paperless-ngx.com/configuration/#PAPERLESS_ADMIN_USER
docker-compose.yml配置中有设置超级用户和超级密码
PAPERLESS_ADMIN_USER: admin
PAPERLESS_ADMIN_PASSWORD: admin
访问IP:端口
进入到登录页面
用户名:admin
密码:admin
登录成功
上传一个文档,比如票据图片
点击浏览文件
上传成功后点击打开文档
或者点击左侧的文档tab
然后双击要查看的文档
可以看到对应的电子化文档详情
这个内容为ocr自动识别的内容
可以被搜索栏按关键字检索
可以只能默认识别为英文,不支持简体中文识别
ocr对应配置文档
https://docs.paperless-ngx.com/configuration/#ocr
如需要支持简体中文,需要在docker-compose.yml中添加参数
PAPERLESS_OCR_LANGUAGE: chi_sim
但是添加后会报错不支持
目前paperless-ngx对简体中文,各个内置的ocr都不支持
建议在保存的时候可以使用别的ocr来进行电子文档内容的录入,方便后续查找
可以放一些PDF,图片等
目前还没有高频使用,后期可能录入
-
各种证件(身份证,社保卡,公积金卡,银行卡,会员卡等)
-
各种票据(超市小票,外卖小票,存款票据等各种交易票据)
-
各种合同(入职合同,体检报告,离职协议,保密协议,银行合同,房租合同等)
-
各种说明(说明书等)
初步使用的第一个缺点是ocr只支持英文
后期可以找一个自部署的ocr服务器,来结合使用,手动繁琐一点,但是更为准确
END.
觉得本文还行,不妨顺手点赞和收藏,下期见。
推荐阅读
《HomeLab迷你小主机(x86):Docker部署开源消息推送通知barkServer(适用于ios)》
《HomeLab迷你小主机(x86):Docker部署开源dashy,自托管个人导航、仪表板、可视化小部件》
《HomeLab迷你小主机(x86):Docker部署开源建站LMS在线教育Moodle》
☕ 朋友,都看到这了,确定不关注一下么? 👇