Want to grep records in alphabetical order from a file and split into other files

shekhar_4_u · July 25, 2015, 10:15am

Hi All,

I have one file containing thousands of table names in single column. Now I want that file split into multiple files e.g one file containing table names starting from A, other containing all tables starting from B...and so on..till Z.

I tried below but it did not work.

for i in {A..Z}; do grep ^$i ; done< erg.txt

Using this logic it just gives tables starting with A and then stops. While it should give all files in sequence, then I can use some logic to save them alphabetically Please advise.

here, erg.txt is the name of main file. Sample is like this

user@86340-hostname:~$ cat erg.txt | head
ABD_DET
ABS_MSTR
ABSC_DET
ABSCC_DET
ABSD_DET
ABSI_MSTR
ABSL_DET
ABSP_DET
ABSPLI_REF
ABSR_DET

vgersh99 · July 25, 2015, 10:43am

awk 'NF{if(f) close(f);f=substr($1,1,1)".txt";print >>f}' erg.txt

RudiC · July 25, 2015, 11:04am

This proposal is nice and short and solves the problem, but in case (large chunks of) the input file is sorted, it performs too many unnecessary file open/close operations. Try a small adaption:

awk '
NF      {TMP=substr($1,1,1)".txt"
         if (FN && FN != TMP) close (FN)                 
         FN=TMP
         print >> FN
        }
' file

shekhar_4_u · July 26, 2015, 12:05am

rudic:

This proposal is nice and short and solves the problem, but in case (large chunks of) the input file is sorted, it performs too many unnecessary file open/close operations. Try a small adaption:
awk '
NF      {TMP=substr($1,1,1)".txt"
   if (FN && FN != TMP) close (FN)                 
   FN=TMP
   print >> FN
   }
' file

Thanks..It worked like a magic

Could you please explain the logic as well? I never used FN in awk so it will be a learning for me.

RudiC · July 26, 2015, 1:36am

It extracts the first character from the first field in every line and stores it, extended by the string constant ".txt", into a temp string variable. If this differs from the old file name in variable FN, close the old file. Then assign the temp var to the file name var FN, and append the entire line to this file.

BTW, a small adaption to your own code snippet would have made it work:

for i in {A..Z}; do grep ^$i <erg.txt >$i.txt ; done

, although it would have opened and read erg.txt 26 times.

Aia · July 26, 2015, 2:04am

Plus, it would have created 26 files with A to Z whether grep found content to place into or not from erg.txt.

shekhar_4_u · July 27, 2015, 7:15am

Thanks guys! I appreciate your responses.