[python] merge pdf files.

SaltCityScripts · October 8, 2019, 4:47pm

First off I am very new to python but not to scripting I have done a lot of bash scripting.

I need to create a python script for work that will combine multiple pdf files into one pdf file and archive both the combined file and the original pdf files.

So we receive zip files from a client (the file name will either begin with a number #########.zip or MU3-#######-#.zip) and I need to unzip the zip file and use the file name as the name of the combined file.

example
MU3-6493489-1.zip

when I unzip this file I have the following pdf files.:
MU3-6493489-1_006493489-001_ARINV.pdf
MU3-6493489-1_3461.pdf
MU3-6493489-1_7501.pdf
MU3-6493489-1_CI_2.pdf
MU3-6493489-1_CI_3.pdf
MU3-6493489-1_CI_4.pdf
MU3-6493489-1_CI_5.pdf
MU3-6493489-1_CI.pdf

I need to combine all of these pdf file into a new file called MU3-6493489-1_combined.pdf

I did find a script online that has the basic stuff but it will not work for my needs.

#pdf_merger.py

import glob
from PyPDF2 import PdfFileMerger

def merger(output_path, imput_paths):
    pdf_merger = PdfFileMerger()
    file_handles = []

    for path in input_paths:
        pdf_merger.append(path)

    with open(ouput_path, 'wb') as fileobj:
        pdf_merger.write(fileobj)

if _name_ == '_main_':
    paths = glob.glob('MU3_*.pdf')
    paths.sort()
    merger('pdf_merger2.pdf', paths)

This is not my coding and I am not attached to it in anyway. I know I will have to make a lot of changes to get that to work.

If anyone has any thoughts on how to do this I would be forever thankful and so would my work.

--- Post updated at 08:47 PM ---

on a side note: this can be in either perl or python. I just thought python would be better to learn.

I was not able to edit my original post.

Corona688 · October 8, 2019, 5:44pm

In what way does it not work for your needs?

SaltCityScripts · October 8, 2019, 5:49pm

well I can't get it to run but the main thing, also this script doesn't unzip the zipped file and I need it to use the file name of the zip file as a prefix to the combined file name. Our system uses the file name in order to match to our system.

Corona688 · October 8, 2019, 6:09pm

In what way does it not run?

It will be far easier to modify something that works than something that doesn't. Python programs are 1% python and 99% plugins, so you may need to install a plugin or two, pypdf2 looks important.

jgt · October 9, 2019, 12:38pm

I do this using ghost script.
Use
<code>
For pdf in list
Do
Pdf2ps $pdf >>big.ps
Done
Ps2pdf big.ps big.pd</code>
Check the exact syntax for pdf2ps and excuse the typos from my phone

SaltCityScripts · October 9, 2019, 5:23pm

So when I run my script I get this error:

Traceback (most recent call last):
  File "pdf_merger.py", line 16, in <module>
    if _name_ == '_main_':
NameError: name '_name_' is not defined

Corona688 · October 10, 2019, 12:57pm

Thanks. That statement looks redundant (it's never going to be run as not main) so maybe try this:

#pdf_merger.py

import glob
from PyPDF2 import PdfFileMerger

def merger(output_path, imput_paths):
    pdf_merger = PdfFileMerger()
    file_handles = []

    for path in input_paths:
        pdf_merger.append(path)

    with open(ouput_path, 'wb') as fileobj:
        pdf_merger.write(fileobj)

paths = glob.glob('MU3_*.pdf')
paths.sort()
merger('pdf_merger2.pdf', paths)