Split file based on a column/field value

galaxy_rocky · August 27, 2014, 5:38am

Hi All,

I have a requirement to split file into 2 sets of file. Below is a sample data of the file

AU;PTN;24EX;25-AUG-14;AU;123;SE;123;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;456;SE;456;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;147;SE;147;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;369;SE;369;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;789;SE;789;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;369;SE;369;Test NN;;;;QWE;
AU;PTN;24EX;25-AUG-14;AU;789;SE;789;Test NN;;;;RTY;
AU;PTN;24EX;25-AUG-14;AU;789;SE;789;Test NN;;;;UIO;

The fields are seperated by ";"
I want the file to be split based on the 13th field.
Based on the 13th field, I need all ASD related data in one file, and the remaining (QWE, RTY and UIO) in one file

I came up with

awk -F';' '{print > $13".txt"}' input_file

But this is generating 4 sets of files.. 1 for ASD, 1 for QWE, 1 for RTY, and 1 for UIO.. but i need only 2 sets of files.. i.e. one file for ASD related values, and another file for remaining files.

Please help me to achive this !

Regards,

SriniShoo · August 27, 2014, 5:43am

awk -F ';' '$13 == "ASD" {print > ("ASD.txt"); next} {print > ("REST.txt")}' input_file

junior-helper · August 27, 2014, 6:52am

With field 13 being the last field at the same time, this approach may be sufficient:

$ grep -E 'ASD;$' in > out.asd
$ grep -vE 'ASD;$' in > out.other

RavinderSingh13 · August 27, 2014, 7:33am

Hello,

Here is one more approach by reading the files 2 times. Let us say we have input filename whose name is test7.

awk -F";" 'NR==FNR{a["ASD"]=$0;next} ($13 in a){print $0 >> $13"_"FILENAME".txt"} !($13 in a){print $0 >> "Other.txt"}' OFS=";" test7 test7

It will create 2 files named ASD_test7.txt and Other.txt as follows

cat ASD_test7.txt
AU;PTN;24EX;25-AUG-14;AU;123;SE;123;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;456;SE;456;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;147;SE;147;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;369;SE;369;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;789;SE;789;Test NN;;;;ASD;
 
cat Other.txt
AU;PTN;24EX;25-AUG-14;AU;369;SE;369;Test NN;;;;QWE;
AU;PTN;24EX;25-AUG-14;AU;789;SE;789;Test NN;;;;RTY;
AU;PTN;24EX;25-AUG-14;AU;789;SE;789;Test NN;;;;UIO;

EDIT: Without reading file twice. Thanks Scrutinizer for suggestion.

awk -F";" '($13=="ASD"){print $0 >> "ASD_"FILENAME".txt"} ($13!="ASD"){print $0 >> "Others.txt"}' test7
OR
awk -F";" '{if($13=="ASD"){print $0 >> "ASD_"FILENAME".txt"} else {print $0 >> "Others.txt"}}' test7

Output will be as follows.

cat Others.txt
AU;PTN;24EX;25-AUG-14;AU;369;SE;369;Test NN;;;;QWE;
AU;PTN;24EX;25-AUG-14;AU;789;SE;789;Test NN;;;;RTY;
AU;PTN;24EX;25-AUG-14;AU;789;SE;789;Test NN;;;;UIO;
 
cat ASD_test7.txt
AU;PTN;24EX;25-AUG-14;AU;123;SE;123;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;456;SE;456;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;147;SE;147;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;369;SE;369;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;789;SE;789;Test NN;;;;ASD;

Thanks,
R. Singh

Scrutinizer · August 27, 2014, 8:03am

@ravinder, the use of an array with one single element and for every line changing it content with random information and reading the file twice is not necessary. Instead of $13 in a you can use $13=="ASD" and then you can leave out the NR==FNR section.

RudiC · August 27, 2014, 2:25pm

awk -F';' '"ASD"==$13 {print > "ASD.txt";next}1' input_file >other.txt

galaxy_rocky · September 1, 2014, 2:50am

Thank you all for the replies.. this solved my issue
Thanks once again for your valuable time....

Regards,

---------- Post updated at 01:50 AM ---------- Previous update was at 01:49 AM ----------

Thanks Srini !!