Dynamic command line generation with awk

Hi,
I'm not an expert in awk but i need a simple script to do this:

I'd like to AutoCrop PDF files.
I found 2 simple script that combined together could help me to automatize :slight_smile:

The first utiliti is "pdfinfo" that it gives the MediaBox and TrimBox values from the pdf.
The pdfinfo output is:

[bags@TheBagsMan Scrivania]$ pdfinfo -box '/home/bags/Scrivania/0280.u3_capitolo15.pdf' 
Creator: Adobe InDesign CS3 (5.0.4)
Producer: Adobe PDF Library 8.0
CreationDate: Sun Nov 8 18:52:01 2009
ModDate: Tue Mar 16 14:48:00 2010
Tagged: no
Pages: 18
Encrypted: no
Page size: 680.268�898.535 pts
MediaBox: 0.00 0.00 680.27 898.53
CropBox: 0.00 0.00 680.27 898.53
BleedBox: 28.32 28.32 651.94 870.21
TrimBox: 42.50 42.50 637.77 856.04
ArtBox: 42.50 42.50 637.77 856.04
File size: 29864683 bytes
Optimized: no
PDF version: 1.6

Once obtained the values I've to give them as imput for another utility "pdfcrop"

[bags@TheBagsMan Scrivania]$ pdfcrop --margins '-42.50 -42.50 -42.50 -42.49' '/home/bags/Scrivania/0280.u3_capitolo15.pdf' '/home/bags/Scrivania/0280.u3_capitolo15.pdf'

The problem is that the "--margins" option values should be calculated as

MediaBox - TrimBox single values (the first -42.5 --> 0.00-42.50; second -42.50 --> 0.00-42.50; third -42.50 --> 680.27-637.77; last -42.49 --> 898.53-856.04)

Id like to have an AWK that recreate the pdfcrop command...
Someone can help me?

thx,
Giovanni

margins=`pdfinfo -box �/home/bags/Scrivania/0280.u3_capitolo15.pdf�  |awk '/MediaBox/ {a=$2;b=$3;c=$4;d=$5} /TrimBox/ {print a-$2,b-$3,c-$4,d-$5}'`

pdfcrop --margins "$margins" '/home/bags/Scrivania/0280.u3_capitolo15.pdf' '/home/bags/Scrivania/0280.u3_capitolo15.pdf'
second -42.50 --> 0.00-42.50; third -42.50 --> 680.27-637.77; last -42.49 --> 898.53-856.04

last two values should be positive.

file=/home/bags/Scrivania/0280.u3_capitolo15.pdf
pdfcrop --margins "$(pdfinfo -box "$file" |awk '/^MediaBox:/ {m1=$2;m2=$3;m3=$4;m4=$5} /^TrimBox:/{print m1-$2,m2-$3,m3-$4,m4-$5}')" "$file"

Another one. Check the output first without the coloured part:

#!/bin/sh

file="/home/bags/Scrivania/0280.u3_capitolo15.pdf"

pdfinfo -box "'""$file""'" |	
awk -v f="$file" '
/MediaBox:/{a=$2; b=$3; c=$4; d=$5}
/TrimBox:/{printf("pdfcrop --margins  %.2f %.2f %.2f %.2f \047%s\047 \047%s\047\n", a-$2, b-$3, c-$4, d-$5, f, f)}
' | sh

And another one (untested!):

awk 'BEGIN {
pdfi = "pdfinfo"
pdfc = "pdfcrop"
for (i = 0; ++i <= ARGC - 1;) {
  pdficmd = pdfi " -box \47" ARGV "\47"
  mb = x
  while ((pdficmd | getline) > 0) {
    /^MediaBox/ && mb = $0
    mb && split(mb, t)
    /^TrimBox/ && pdfccmd = sprintf( "%s --margins \47%.2f %.2f %.2f %.2f\47 \47%s\47 \47%s\47", pdfc,  \
    t[2] - $2, t[3] - $3, t[4] - $4, t[5] - $5, ARGV, ARGV)
  }
  close(pdficmd); system(pdfccmd); close(pdfccmd)
  }
}' file1.pdf [file2.pdf .. ]

This works fine :slight_smile:

Thx a lot
Giovanni

---------- Post updated at 01:45 PM ---------- Previous update was at 01:29 PM ----------

mhhhh
if i use:

file="/home/bags/Scrivania/0280.u3_capitolo15.pdf"
pdfinfo -box "$file" | awk '/^MediaBox:/ {m2=$2;m3=$3;m4=$4;m5=$5} /^TrimBox:/{print m2-$2 ,m3-$3 ,$4-m4 ,$5-m5}

I obtain right values.
but If I use the whole part:

file="/home/bags/Scrivania/0280.u3_capitolo15.pdf"
pdfcrop --margins $(pdfinfo -box "$file" | awk '/^MediaBox:/ {m2=$2;m3=$3;m4=$4;m5=$5} /^TrimBox:/{print m2-$2 ,m3-$3 ,$4-m4 ,$5-m5}')" "$file"

I obtain an error :frowning:

PDFCROP 1.23, 2010/01/09 - Copyright (c) 2002-2010 by Heiko Oberdiek.
!!! Error: Input file `' not found!

seems wrong in command expression :frowning:

I should have:

pdfcrop --margins '-42.50 -42.50 -42.50 -42.49' '/home/bags/Scrivania/0280.u3_capitolo15.pdf' '/home/bags/Scrivania/0280.u3_capitolo15.pdf'

Giovanni

You didn't start the quote.
by the way, you can also use any of the solutions.

pdfcrop --margins "$(pdfinfo -box "$file" | awk '/^MediaBox:/ {m2=$2;m3=$3;m4=$4;m5=$5} /^TrimBox:/{print m2-$2 ,m3-$3 ,$4-m4 ,$5-m5}')" "$file"

K,
Here's the solution :stuck_out_tongue:

#!/bin/sh
file="/home/bags/Scrivania/0280.u3_capitolo15.pdf"
pdfinfo -box "$file" | awk -v f="$file" '/MediaBox:/{a=$2; b=$3; c=$4; d=$5} /TrimBox:/{printf("pdfcrop --margins \"%.2f %.2f %.2f %.2f\" \047%s\047 \047%s\047\n", a-$2, b-$3, $4-c, $5-d, f, "/tmp/tagliati/"f)}' | sh

Thanks to all.

Giovanni

---------- Post updated at 03:57 PM ---------- Previous update was at 02:35 PM ----------

I wrote this that should crop all *.pdf files in /home/bags/Scrivania/

#!/bin/sh
cd /home/bags/Scrivania/

for file in `ls -r *.pdf`
	do
		echo "$file"
		pdfinfo -box "$file" | awk -v f="$file" '/MediaBox:/{a=$2; b=$3; c=$4; d=$5} /TrimBox:/{printf("pdfcrop --margins \"%.2f %.2f %.2f %.2f\" \047%s\047 \047%s\047\n", a-$2, b-$3, $4-c, $5-d, f, "/tmp/tagliati/"f)}' | sh
		# rm $file
	done

The problem is when a filename cointains blancs :frowning: (as example "0020.test tost.pdf")
The "$file" variable takes "0020.test" and then "tost.pdf" :frowning:

There's a way to take the complete filename?

Thx,
Giovanni

for file in *.pdf; do ...