Help required with a file rename shell script

tayyabq8 · June 25, 2019, 4:04am

Hello everyone,

Posting here after a long time, been away from unix world lately and it seems I have forgotten my shell scripting completely. I have a requirement where a csv file contains following columns:

Full Registration	VIN			Stock ID	Mileage	InternalTrim	Description	Warranty	FranchiseApproved	RegistrationDate	Featured	New	Vehicle Type	Used stock images
2653BA			WDD1173461N6437866	2725		19434	Leather trim			Y		Y			19/09/2018		N		N	CAR	
8874MS			WDD1173461N6494217	2745		15452	Leather trim			Y		Y			19/09/2018		N		N	CAR

This csv file will be sent to a hosting service provider who is hosting our used car website. Every used car has some images and names of those images need to be populated in the last column of this csv file but there is one challenge, image files are stored (in the same directory) as Stock ID_001, Stock ID_002 and so on. From this csv file example, image files will be 002795_001, 002795_002, 002795_003 and same for rest of the vehicles. We don't know in advance how many image files will be there for one stock ID. We need to rename these image files as VIN NO_1, VIN_NO_2 and so on. Therefore for this example, image file names will become WDD1173461N6437866_1, WDD1173461N6437866_2, WDD1173461N6437866_3 because stock ID 2795 has VIN number WDD1173461N6437866.

Once the image files are renamed then we need to populate these names in the last column of above csv file. so the last column of above csv file will become (for the first record). Please note we have to ignore the header and image file names to be populated from 2nd record onwards (after renaming) as explained above.

Full Registration	VIN			Stock ID	Mileage	InternalTrim	Description	Warranty	FranchiseApproved	RegistrationDate	Featured	New	Vehicle Type	Used stock images
2653BA			WDD1173461N6437866	2725		19434	Leather trim			Y		Y			19/09/2018		N		N	CAR		WDD1173461N6437866_1,WDD1173461N6437866_2,WDD1173461N6437866_3

Can you please help me with this challenge?

Regards,
Tayyab

RudiC · June 25, 2019, 4:24am

Try

awk '
NR == 2 {("ls -x *" $3 "*") | getline L
         gsub (/0+/, "", L)
         gsub ($3, $2, L)
         gsub (/[ 	]+/, ",", L)
         $0 = $0 "\t" L}
1
  ' file
Full Registration	VIN			Stock ID	Mileage	InternalTrim	Description	Warranty	FranchiseApproved	RegistrationDate	Featured	New	Vehicle Type	Used stock images
2653BA			WDD1173461N6437866	2795		19434	Leather trim			Y		Y			19/09/2018		N		N	CAR		WDD1173461N6437866_1,WDD1173461N6437866_2,WDD1173461N6437866_3
8874MS			WDD1173461N6494217	2745		15452	Leather trim			Y		Y			19/09/2018		N		N	CAR

Be aware there's no 2095 Stock ID in your sample csv file.

tayyabq8 · June 25, 2019, 4:44am

Hi RudiC,

thanks for your quick reply. I'll try that awk script at my end and come back. Sorry for confusing with the wrong stock ID number.

Regards,
Tayyab

tayyabq8 · June 25, 2019, 10:35am

Hi RudiC,

Sorry for the confusion and providing the wrong file format and image file names, here is the sample from csv file:

tayyab@c549:~$ cat test.csv
Full Registration,VIN,Stock ID,Mileage,InternalTrim,Description,Warranty,FranchiseApproved,RegistrationDate,Featured,NewVehicleType,Used stock images
2653BA,WDD1173461N6437866,2795,19434,Leather trim,A Class,Y,Y,19/09/2018,N,CAR,
8874MS,WDD1173461N6494217,2745,15452,Leather trim,A Class,Y,Y,19/09/2018,N,CAR,

And here is the list of images:

tayyab@c549:~$ ls -ltr 00*
-rw-rw-rw- 1 tayyab tayyab 0 Jun 25 17:59 002795_001.jpg
-rw-rw-rw- 1 tayyab tayyab 0 Jun 25 17:59 002795_002.jpg
-rw-rw-rw- 1 tayyab tayyab 0 Jun 25 18:00 002795_003.jpg
-rw-rw-rw- 1 tayyab tayyab 0 Jun 25 18:27 002745_001.jpg
-rw-rw-rw- 1 tayyab tayyab 0 Jun 25 18:27 002745_002.jpg

Once I ran the shell script, i want test.csv file to be like:

tayyab@c549:~$ cat test.csv
Full Registration,VIN,Stock ID,Mileage,InternalTrim,Description,Warranty,FranchiseApproved,RegistrationDate,Featured,NewVehicleType,Used stock images
2653BA,WDD1173461N6437866,2795,19434,Leather trim,A Class,Y,Y,19/09/2018,N,CAR,WDD1173461N6437866_1, WDD1173461N6437866_2,WDD1173461N6437866_3
8874MS,WDD1173461N6494217,2745,15452,Leather trim,A Class,Y,Y,19/09/2018,N,CAR,WDD1173461N6494217_1,WDD1173461N6494217_2

And also file names to be renamed to as follows:

WDD1173461N6437866_1 
WDD1173461N6437866_2
WDD1173461N6437866_3
WDD1173461N6494217_1
WDD1173461N6494217_2

there have to be two steps

i) Look for stock ID within the stock feed csv file and rename all the images (replace stock ID with VIN code)
ii) Once the files are renamed then append those JPG images (with VIN# and _1, _2) to the last column of the feed file.

Your code didn't work out for me if you or someone else can provide me with a solution to the above challenge that'll be great. Thanks.

Regards,
Tayyab

MadeInGermany · June 25, 2019, 2:10pm

Here is a standard shell script. It does not remove the .jpg extensions.
It only echoes things, so you get an idea what it could do.

#!/bin/sh
while IFS="," read f_reg vin s_id rest
do
  # do not process header and junk
  out=""
  if [ "$vin" != "VIN" ] && [ -n "$vin" ] && [ -n "$s_id" ]
  then 
    sep=""
    for i in `ls | grep "^0*${s_id}_"`
    do
      # rename from $s_id to $vin, delete leading 0 characters
      new_i=`echo "$i" | sed "s/^0*${s_id}_0*/${vin}_/"`
      echo mv "$i" "$new_i"
      out="${out}${sep}${new_i}"
      sep=" "
    done
  fi
  # delete a trailing comma before adding a new one as separator
  echo "$f_reg,$vin,$s_id,${rest%,}${out:+,$out}"
done < test.csv

Scrutinizer · June 25, 2019, 3:39pm

Here is another example you could try, all in bash shell without external utilities or subshells except for the mv command, so it should be reasonably quick.

#!/bin/bash
oldIFS=$IFS
while IFS="," read -a car; do
  if [[ ${car[1]} != VIN ]]; then
    for file in *"${car[2]}_"*.jpg
    do
      if [ -f "$file" ]; then
        tofile=${car[1]}${file#*"${car[2]}"}
        echo mv -- "$file" "$tofile" 
        car[${#car[@]}]=$tofile
      fi
    done
  fi
  IFS=","
  printf "%s\n" "${car[*]}"
  IFS=$oldIFS
done < test.csv

It also does not entirely produce the proper output format:

mv -- 002795_001.jpg WDD1173461N6437866_001.jpg

, but it should give you some idea..

Remove the echo statement when it does what you want...

RudiC · June 25, 2019, 4:46pm

No surprise. Your post #1 input sample provided the wrong structure, contained wrong data, and the sample file names didn't reflect reality either. What would you expect? With your "revised" data and input, try

awk '
NR >= 2 {("ls -x *" $3 "*.jpg") | getline L
         gsub (/[       ]+/, ",", L)
         LB = L
         gsub (/0+|.jpg/, "", L)
         gsub ($3, $2, L)
         $0 = $0  L
              split (LB, T1)
         n  = split (L,  T2)
         for (; n; n--) OUT = OUT sprintf ("echo mv -- %s %s\n", T1[n], T2[n] ".jpg")
        }
1
END     {system (OUT)
        }
' FS=, file
Full Registration,VIN,Stock ID,Mileage,InternalTrim,Description,Warranty,FranchiseApproved,RegistrationDate,Featured,NewVehicleType,Used stock images
2653BA,WDD1173461N6437866,2795,19434,Leather trim,A Class,Y,Y,19/09/2018,N,CAR,WDD1173461N6437866_1,WDD1173461N6437866_2,WDD1173461N6437866_3
8874MS,WDD1173461N6494217,2745,15452,Leather trim,A Class,Y,Y,19/09/2018,N,CAR,WDD1173461N6494217_1,WDD1173461N6494217_2
mv -- 002795_003.jpg WDD1173461N6437866_3.jpg
mv -- 002795_002.jpg WDD1173461N6437866_2.jpg
mv -- 002795_001.jpg WDD1173461N6437866_1.jpg
mv -- 002745_002.jpg WDD1173461N6494217_2.jpg
mv -- 002745_001.jpg WDD1173461N6494217_1.jpg

The first three lines are your target .csv file printed to stdout. The remaining 6 lines are the test output from the system call; remove the echo from the sprintf command to really have the files renamed.

EDIT: Slight simplification:

awk '
NR >= 2 {CMD = "ls *" $3 "*.jpg"
         while (1 == CMD | getline FN)
                {FNN = FN
                 gsub (/0+|.jpg/, "", FNN)
                 gsub ($3, $2, FNN)
                 $0 = $0 FS FNN
                 OUT = OUT "echo mv -- " FN " " FNN ".jpg\n"
                }
         close (CMD)
        }
1
END     {system (OUT)
        }
' FS=, file

MadeInGermany · June 25, 2019, 4:50pm

The probelm with a *${var}_* glob is that it is missing a leading anchor.
E.g. $var==2795 will match 002795_001.jpg and 012795_001.jpg
Reason enough to go for the less efficient but correct `ls | grep` and a ^0* anchor in the RE.

tayyabq8 · June 26, 2019, 4:36am

Thanks to everyone, great solutions, and discussion. I'll try all of these out and let you know if I need any further help. I'll also post here my modified script which will go into production. Apologies again for the initial confusion. Have a great day ahead.

Chubler_XL · July 24, 2019, 5:33pm

I was thinking that bash Extended globbing and nullglob could be the solution to this, without the efficiency hit:

awk '
NR >= 2 {
  CMD="bash -O extglob -O nullglob -c \"printf \\\"%s\\n\\\" *(0)" $3 "_*.jpg\""
  while (CMD | getline FN && length(FN)) {
...

However, this still ends up invoking sh and then bash , which is only slightly better than sh ls and grep . The complexity (all those backslashes!) and reliance on bash make this more an academic solution rather than something I'd be inclined to use in the real world.

tayyabq8 · August 26, 2019, 3:28am

rudic:

No surprise. Your post #1 input sample provided the wrong structure, contained wrong data, and the sample file names didn't reflect reality either. What would you expect? With your "revised" data and input, try

awk '
NR >= 2 {("ls -x *" $3 "*.jpg") | getline L
   gsub (/[       ]+/, ",", L)
   LB = L
   gsub (/0+|.jpg/, "", L)
   gsub ($3, $2, L)
   $0 = $0  L
   split (LB, T1)
   n  = split (L,  T2)
   for (; n; n--) OUT = OUT sprintf ("echo mv -- %s %s\n", T1[n], T2[n] ".jpg")
   }
1
END     {system (OUT)
   }
' FS=, file
Full Registration,VIN,Stock ID,Mileage,InternalTrim,Description,Warranty,FranchiseApproved,RegistrationDate,Featured,NewVehicleType,Used stock images
2653BA,WDD1173461N6437866,2795,19434,Leather trim,A Class,Y,Y,19/09/2018,N,CAR,WDD1173461N6437866_1,WDD1173461N6437866_2,WDD1173461N6437866_3
8874MS,WDD1173461N6494217,2745,15452,Leather trim,A Class,Y,Y,19/09/2018,N,CAR,WDD1173461N6494217_1,WDD1173461N6494217_2
mv -- 002795_003.jpg WDD1173461N6437866_3.jpg
mv -- 002795_002.jpg WDD1173461N6437866_2.jpg
mv -- 002795_001.jpg WDD1173461N6437866_1.jpg
mv -- 002745_002.jpg WDD1173461N6494217_2.jpg
mv -- 002745_001.jpg WDD1173461N6494217_1.jpg

The first three lines are your target .csv file printed to stdout. The remaining 6 lines are the test output from the system call; remove the echo from the sprintf command to really have the files renamed.

EDIT: Slight simplification:

awk '
NR >= 2 {CMD = "ls *" $3 "*.jpg"
   while (1 == CMD | getline FN)
   {FNN = FN
   gsub (/0+|.jpg/, "", FNN)
   gsub ($3, $2, FNN)
   $0 = $0 FS FNN
   OUT = OUT "echo mv -- " FN " " FNN ".jpg\n"
   }
   close (CMD)
   }
1
END     {system (OUT)
   }
' FS=, file

Hi RudiC,

I used your simplified method and it worked fine, only one trouble I have is that the script doesn't work if stock ID has "0" in it, for example, it will not work if the image file name is "002501_001.jgp" please note that from character 3rd to 6th character is stock ID and can contain "0" in it, can you please modify your provided script to cater to this need? This will be a really great help. Many thanks in advance.

Regards,
Tayyab

RudiC · August 26, 2019, 3:49am

Would it suffice to "anchor" the gsub search pattern like

gsub (/^0+|.jpg/, "", FNN)

? Be aware that the files' sequence number will have leading zeroes, then.

tayyabq8 · August 26, 2019, 3:52am

Hi RudiC,

Yes I thought about it and yes anchoring alone will not help as the sequence numbers will have leading zeros as mentioned by yourself, any other advice, please. Many thanks once again.

Regards,
Tayyab

RudiC · August 26, 2019, 3:54am

OK, add

sub  (/_0+/, "_", FNN)

just below the first gsub .

tayyabq8 · August 26, 2019, 4:29am

Hi RudiC,

Perfect. It worked. I'm marking this case as solved. Many thanks for your support.

Regards,
Tayyab