Find a string and then return the next 20 characters in multiple files

jdinero · September 25, 2013, 11:12am

Hello all,

I have a directory with 2000+ files. I need to look in each file for an invoice number. To identify this, i can search for the string 'BIG' and then retrieve the next 30 characters. I was thinking awk for this, but not sure how to do it. Each file contains one long string and in the middle is the invoice number. If i can find the position of the 'BIG' pattern, then grab the next 30 characters, I can extrapolate the invoice number I need.

I basically need to pull out all 2000+ invoice numbers and put them in one file, one invoice number per line.

Any help is much appreciated??

SAMPLE input:

TEST FILE|USING|NEW|SYSTEM|BIG|20130924|49685234|THIS ISNT THE END|BYE

output needed:

BIG|20130924|49685234

keep in mind i need to do this to 2000+ files in one directory.

THANKS!

Jennifer

Subbeh · September 25, 2013, 11:29am

I only count 18 characters after BIG, if that's the case you can use this:

grep -oE BIG.\{18\} file

disedorgue · September 25, 2013, 11:29am

Hi,
Your demand:

grep -o 'BIG.\{1,30\}' file

But, maybe better

grep -o 'BIG|\([^|]\+|\)\{1,2\}' file

where file is a list of file ==> * for all in directory
-h option if you don't want the file name in the resultat.

Regards.

jdinero · September 25, 2013, 11:32am

Thank you both but when i try those command, it says it doesn't recognize the -o flag?

disedorgue · September 25, 2013, 11:35am

Ok,
with sed:

sed -n 's/.*\(BIG.\{1,30\}\).*/\1/p'

sed -n 's/.*\(BIG|\([^|]\+|\)\{1,2\}\).*/\1/p'

regards.

jdinero · September 25, 2013, 11:38am

thanks for the sed, but I am not sure how to use that with a list of files?

disedorgue · September 25, 2013, 11:46am

as explain for grep:

sed .... *

Regards.

jdinero · September 25, 2013, 11:47am

wonderful, it works perfectly!!!

RudiC · September 25, 2013, 3:05pm

You may want to know the filename attached to that invoice number, and the invoice number may be less than, or even more than 8 chars. Try this:

awk '{for (i=1;  i<=NF; i++) if ($i=="BIG") print FILENAME, ": ", $i, $(i+1), $(i+2)}' FS="|"  *
file :  BIG 20130924 49685234

In case there's only one InvNo per file, and your awk has the "nextfile" command, you may want to add the nextfile to the end of the script line.