awk command to split pipe delimited file

Hello,

I need to split a pipe de-limited file based on the COLUMN 7 value . If the column value changes I need to split the file

Source File

Payment|ID|DATE|TIME|CONTROLNUMBER|NUMBER|NAME|INDICATOR
42156974|1137937|10/1/2018|104440|4232|2054391|CARE|1
42156978|1137937|10/1/2018|104440|4232|2054391|CARE|0
42156982|1137937|10/1/2018|104440|4230|2054391|UNIVERSAL|8
42157000|1137937|10/1/2018|104440|4230|2054391|UNIVERSAL|6
42157012|1137937|10/1/2018|104440|4235|2054391|ALLIED|10

Split File 1 : Output File Name should be : 1_CARE

42156974|1137937|10/1/2018|104440|4232|2054391|CARE|1
42156978|1137937|10/1/2018|104440|4232|2054391|CARE|0

Split File 2 : Output File Name should be : 2_UNIVERSAL

42156982|1137937|10/1/2018|104440|4230|2054391|UNIVERSAL|8
42157000|1137937|10/1/2018|104440|4230|2054391|UNIVERSAL|6

Split File 3: Output File Name should be :3_ALLIED

42157012|1137937|10/1/2018|104440|4235|2054391|ALLIED|10

Please advise

Have you tried anything, or you just expect an awk one-liner ?

Regards
Peasant.

Hi Peasant

I tried few options but I am no way closer. I captured column seven in variable, I need to now check this variable against the next line and then split when it changes.

 awk -F\|  '{ var1=$7; var2=$1; print var1, var2 }' Input.txt

Please advise

awk -F"|" 'NR>1 { a[$0]=$7 } END { for ( i in a ) print i > "1_"a } ' input

If you have gigabyte files, a different approach would be needed to minimize memory usage, since array a would become huge on those files.

But that's another problem, which would require a bit larger and more efficient program.
If that is the case, get back here, and we shall think of something.

Hope that helps
Regards
Peasant.

1 Like

A bit simpler, no memory hogger:

awk -F\| 'NR>1 {if (!X[$7]) X[$7] = ++CNT; print > (X[$7] "_" $7)}' file

If you have a large number of different output files (exceeding system limits) you'll need to append to the files and close them after writing.

2 Likes

Rudi C.

The three output file names have 1_ as prefix , can you please update the code to go in sequential manner i.e. 1_xx, 2_xxx e.t.c

Thanks

Mine has _1 hard coded.

RudiC code enumerates properly, creating three files from current input.

1_CARE  2_UNIVERSAL  3_ALLIED

Regards
Peasant.

1 Like

Peasant,

Did you try running Rudic's code ?

Thanks

Of course i did.
Just a moment ago, and several days go when i examined it to learn.

Regards
Peasant.

Thank you Rudi C & Peasant