awk filelist containing strange characters

I've written a script:

find -depth | awk �
   {
      if ( substr($1,length($0)-2,3) == �/1.� )
         { print $1 }
         { system(�awk -f test1.awk � $1 ) }
   }
�

The idea is that it trundles through a large directory structure looking for files which are named '1.' and then passes them to an awk script which processes them. However, the directory structure has some unusual folders which are tripping it up. These folders have the caret (^) and commas (,) in their names, for example:

INBOX^Year2011
Details for Sue, Bob, Jane

I think with those sorts of folder names you have to enclose everything in double quotes, but I'm not sure how to do that in the above script. I've tried playing around with variables, and /" and suchlike, but I'm not sure if it's even possible. My scripts are displaying errors such as 'Cannot open.... no such file or directory...' The {print $1} line seems to work - it displays the full path to the file.

Thanks.

Careful, whatever word processor you're using to edit your text is substituting odd characters for ' and ".

You could put some of that processing work into find and make your awk script redundant, allowing find to flawlessly feed filenames into awk for you. No special quoting is necessary this way.

find -depth -type f -name '1.' -print -exec echo awk -f test1.awk '{}' ';'

The -name makes it find files named exactly "1.", -print makes it print the name to stdout first, then -exec runs your awk script on the file -- or would if the echo wasn't there.

Remove the 'echo' once you've tested that it does exactly what you want.

1 Like

Thanks very much for that - I'll give it a whirl.

I just re-typed the script into Word on my PC so that I could post it here - the original was created using vi on my Unix server.

I suggest typing into either the window the web page provides, or notepad, not just here but on any technical forum. Word does many untoward substitutions beyond the ones I've noted that would mess up a script and make it difficult to deduce what the code actually was.