rename files Ax based on strings found in files Bx

inCH · October 7, 2009, 2:26am

Hi,

I'm not very experienced in shell scripting and that's probably why I came across the following problem:

I do have several hundred pairs of text files (PF00x.spl and PF00x.shd) where the first file (PF00x.spl) needs to be renamed according a string that is included in the second file (PF00x.shd).

Example:

PF00x.shd contains a string (length varies from file to file; example would be: AB09098765 7 6 5) within the follwing keywords "bviwacsbviwacs" and "Adb"

The goal is to rename PF00x.spl to AB09098765 7 6 5.spl and so on for the remaining files.

I hope my description makes sense... any suggestions how I could solve that problem?

Thanks a lot!

steadyonabix · October 7, 2009, 2:45am

give some example lines from the second file rather than describing them

inCH · October 7, 2009, 3:20am

Here you go (extract from PF001.shd):

�2�H&��a �2�H&� b v i w a c s b v i w a c s s m u A B 4 6 2 6 6 7 _ 2 0 0 9 0 8 1 4 7 3 9 4 3 7 1 0 - B e r i c h t A d b P L F A d b \ \ P C 2 6 7 I W A C S

And I'm looking for the string marked bold.

Thanks,

steadyonabix · October 7, 2009, 3:12pm

This works in ksh when run in the same directory as the target files: -

ls *.shd | while read FILE_NAME
do
        if [[ -w ${FILE_NAME%.*}.spl ]] ## Ignore any file with no .spl file
        then
                NEW_FILE_NAME=$( nawk ' BEGIN {
                        ## Spaces in target strings ????
                        first = "b v i w a c s b v i w a c s"   
                        last = "A d b"
                }
                ( $0 ~ first ) && ( $0 ~ last ) {
                        startm = index( $0, first ) + length( first )
                        endm = index( $0, last ) - 1
                        file_name = substr( $0, startm, endm - startm )
                } END {
                        ## Remove spaces from new file name
                        gsub( / /, "", file_name )              
                        print file_name
                } ' $FILE_NAME).spl

                ## Copy. change cp to mv if required
                cp ${FILE_NAME%.*}.spl $NEW_FILE_NAME           
        else
                echo "File ${FILE_NAME} has no matching .spl file"
        fi
done

Your first post had no spaces in the enclosing strings but your second did as well as having unprintable characters so I created a test file with spaces in as a worst case. Running the code as a script called "inch" with these files: -

TX5XN:/home/brad/forum/inch>ls -l
total 8
-rwxrwxrwx 1 brad root 682 2009-10-07 20:07 inch
-rw-r--r-- 1 brad root 143 2009-10-07 18:50 PF00x.shd
-rw-r--r-- 1 brad root   0 2009-10-07 19:37 PF00x.spl
-rw-r--r-- 1 brad root   0 2009-10-07 19:59 PF01x.shd

Gave this output: -

TX5XN:/home/brad/forum/inch>inch
File PF01x.shd has no matching .spl file

TX5XN:/home/brad/forum/inch>ls -l
total 8
-rwxrwxrwx 1 brad root 682 2009-10-07 20:07 inch
-rw-r--r-- 1 brad root 143 2009-10-07 18:50 PF00x.shd
-rw-r--r-- 1 brad root   0 2009-10-07 19:37 PF00x.spl
-rw-r--r-- 1 brad root   0 2009-10-07 19:59 PF01x.shd
-rw-r--r-- 1 brad root   0 2009-10-07 20:11 smuAB462667_2009081473943710-Bericht.spl

11110 · October 7, 2009, 3:35pm

TX5XN:/home/brad/forum/inch>ls -l
total 8
-rwxrwxrwx 1 brad root 682 2009-10-07 20:07 inch
-rw-r--r-- 1 brad root 143 2009-10-07 18:50 PF00x.shd
-rw-r--r-- 1 brad root 0 2009-10-07 19:37 PF00x.spl
-rw-r--r-- 1 brad root 0 2009-10-07 19:59 PF01x.shd

Scott · October 7, 2009, 3:37pm

You're going to be on the short end of a banning any minute now...

11110 · October 7, 2009, 3:45pm

Aha
Ok sir large
I am a Palestinian
I love forums and the U.S.
Ahan recorded heck with you
You want a kick me out Thank you
PDC kept Thank you too

vgersh99 · October 7, 2009, 3:53pm

Hello, !

In case you forgot to read the forum rules, here is quick copy.

RULES OF THE UNIX AND LINUX FORUMS

(1) No flames, shouting (all caps), sarcasm, bullying, profanity or arrogant posts.

(2) No negative comments about others or impolite remarks. Be patient.

(3) Refrain from idle chatter that does not contribute to the knowledge base. This does not apply to the forums in The Unix Lounge which are for off-topic discussions.

(4) Do not 'bump up' questions if they are not answered promptly. No duplicate or cross-posting and do not report a post or send a private message where your goal is to get an answer more quickly.

(5) Search the forums database with your keywords before asking.

(6) Do not post classroom or homework problems.

(7) No job postings from headhunters or recruiters except in The Unix Forums Job Board. See How to Post to The UNIX Forums Job Board for information on using the Job Board.

(8) No BSD vs. Linux vs. Windows or similar threads.

(9) Edit your posts if you see spelling or grammar errors (don't write in cyberchat or cyberpunk style). English only.

(10) Don't post your email address and ask for an email reply. Don't send a private message with a technical question. The forums are for the benefit of all, so all Q&A should take place in the forums.

(11) Post questions with descriptive subjects. For example, do not post questions with subjects like "Help Me!", "Urgent!!" or "Doubt". Post subjects like "Execution Problems with Cron" or "Help with Backup Shell Script".

(12) These are not hacker boards so hacker related posts will be promptly deleted or moderated.

(13) The forum administrators reserve the right to prune, move or edit posts that do not adhere to the rules or are technically inaccurate.

(14) The forum administrators reserve the right to remove users or change their posting status to read only without notice if any rules are not followed.

(15) No smoking in the forums.

Cheers.

The UNIX and Linux Forums

inCH · October 8, 2009, 2:23am

@steadyonabix

Thanks a lot! I'm going to try to adapt your code...

Regards,

inCH · October 8, 2009, 7:22am

@steadyonabix thanks a lot for your code!

I tried to adapt your code but I always get only one file named ".SPL" even when I change the keywords to a single letter for testing purposes...
Seems like I'm doing something the wrong way.

Additional background information: I currently have to use cygwin with bash on WinXP...
Therefore I amended the code as listed at the end of this post.
The keywords are b v i w a c s b v i w a c s (at least when I'm looking at them in bash) and A d o b e

the result I'm looking for would be smuAB462691_200908148 19 27 296 - Bericht.SPL

I've attached two files (one spl and one original shd) for illustration. It would be great if you could take a short look at them.

Thanks again for your help!

-------
current code:
-------

ls *.SHD | while read FILE_NAME
do
        if [[ -w ${FILE_NAME%.*}.SPL ]] ## Ignore any file with no .spl file
        then
                NEW_FILE_NAME=$( awk ' BEGIN {
                        ## Spaces in target strings ????
                        first = "b v i w a c s   b v i w a c s"   
                        last = "A d o b e"
                }
                ( $0 ~ first ) && ( $0 ~ last ) {
                        startm = index( $0, first ) + length( first )
                        endm = index( $0, last ) - 1
                        file_name = substr( $0, startm, endm - startm )
                } END {
                        ## Remove spaces from new file name
                        gsub( / /, "", file_name )              
                        print file_name
                } ' $FILE_NAME).SPL

                ## Copy. change cp to mv if required
                cp ${FILE_NAME%.*}.SPL $NEW_FILE_NAME           
        else
                echo "File ${FILE_NAME} has no matching .spl file"
        fi
done

steadyonabix · October 8, 2009, 1:11pm

Hi

I am going out tonight so will not be able to look at this before tomorrow.

I don't use Cygwin so can't promise much.

There is no substitute for experimenting with it yourself though in the meantime.

Good luck

steadyonabix · October 9, 2009, 1:30pm

Hi Inch

Well this turned out to be more interesting than I originally thought it would be: -

When I got your file I found it would not behave with awk or sed or any of the usual utilities so I took a look at its internal structure by doing an octal dump of the contents.
Here is the part containing your start string with the individual letters highlighted in bold: -

TX5XN:/home/brad/forum/inch>od -c FP00000.SHD | pg                                                                                                                   

0003440   H   & 375 032 001 002  \0  \0   b  \0   v  \0   i  \0   w  \0
0003460   a  \0   c  \0   s  \0  \0  \0   b  \0   v  \0   i  \0   w  \0
0003500   a  \0   c  \0   s  \0  \0  \0   s  \0   m  \0   u  \0   A  \0

As you can see each letter is delimited by a number 0 or null, so anything like awk or sed will fail as they look for string based files, not null delimited chars.
This is why your original paste of the file contained so many unprintable characters and spaces between each letter of your target string.

Having established the problem it is now a simple fix, just remove the nulls prior to manipulating the strings: -

TX5XN:/home/brad/forum/inch>tr -d "\000" < FP00000.SHD | strings
Adobe PDF
PRIV
EBDA
Standard
bviwacsbviwacssmuAB462691_200908148 19 27 296 - BerichtAdobe PDFAdobe PDF ConverterWinPrintNT EMF 1.008\\PC267IWACS

So adding the stripping of the nulls to the tool gives us the ability to correctly process the string we want to turn into a file name: -

TX5XN:/home/brad/forum/inch>inch
TX5XN:/home/brad/forum/inch>ls -l
total 168
-rw-r--r-- 1 brad root  2080 2009-08-14 20:10 FP00000.SHD
-rw-r--r-- 1 brad root 75368 2009-10-08 11:46 FP00000.SPL
-rwxrwxrwx 1 brad root   696 2009-10-09 17:48 inch
-rw-r--r-- 1 brad root 75368 2009-10-09 17:48 smuAB462691_2009081481927296-Berich.SPL

Here is the modified code: -

ls *.SHD | while read FILE_NAME
do
    if [[ -w ${FILE_NAME%.*}.SPL ]]    ## Ignore any file with no .SPL file
    then
        NEW_FILE_NAME=$( tr -d "\000" < $FILE_NAME | strings | nawk ' BEGIN {
            ## Spaces in target strings ????
            first = "viwacsbviwacs"
            last = "Adobe"
        }
        ( $0 ~ first ) && ( $0 ~ last ) {
            startm = index( $0, first ) + length( first )
            endm = index( $0, last ) - 1
            file_name = substr( $0, startm, endm - startm )
        } END {
            ## Remove spaces from new file name
            gsub( / /, "", file_name )    
            print file_name
        } ' ).SPL

        ## Copy. change cp to mv if required
        cp ${FILE_NAME%.*}.SPL $NEW_FILE_NAME
    else
        echo "File ${FILE_NAME} has no matching .SPL file"
    fi
done

Note there is no real error checking for existing files etc, I will leave you to add that yourself.
You should also note it needs to be run in the target directory so you will need to modify it if you want to handle multiple directories etc.
Until you add this kind of validation I would copy the files into a work directory and process them there first to avoid any unfortunate mishaps or lost data.

Hope it is usefull.........

inCH · October 12, 2009, 1:52am

cool - thanks a lot!
Now it's perfectly working.