Hello,
I need to split a pipe de-limited file based on the COLUMN 7 value . If the column value changes I need to split the file
Source File
Payment|ID|DATE|TIME|CONTROLNUMBER|NUMBER|NAME|INDICATOR
42156974|1137937|10/1/2018|104440|4232|2054391|CARE|1
42156978|1137937|10/1/2018|104440|4232|2054391|CARE|0
42156982|1137937|10/1/2018|104440|4230|2054391|UNIVERSAL|8
42157000|1137937|10/1/2018|104440|4230|2054391|UNIVERSAL|6
42157012|1137937|10/1/2018|104440|4235|2054391|ALLIED|10
Split File 1 : Output File Name should be : 1_CARE
42156974|1137937|10/1/2018|104440|4232|2054391|CARE|1
42156978|1137937|10/1/2018|104440|4232|2054391|CARE|0
Split File 2 : Output File Name should be : 2_UNIVERSAL
42156982|1137937|10/1/2018|104440|4230|2054391|UNIVERSAL|8
42157000|1137937|10/1/2018|104440|4230|2054391|UNIVERSAL|6
Split File 3: Output File Name should be :3_ALLIED
42157012|1137937|10/1/2018|104440|4235|2054391|ALLIED|10
Please advise
Have you tried anything, or you just expect an awk one-liner ?
Regards
Peasant.
Hi Peasant
I tried few options but I am no way closer. I captured column seven in variable, I need to now check this variable against the next line and then split when it changes.
awk -F\| '{ var1=$7; var2=$1; print var1, var2 }' Input.txt
Please advise
awk -F"|" 'NR>1 { a[$0]=$7 } END { for ( i in a ) print i > "1_"a } ' input
If you have gigabyte files, a different approach would be needed to minimize memory usage, since array a would become huge on those files.
But that's another problem, which would require a bit larger and more efficient program.
If that is the case, get back here, and we shall think of something.
Hope that helps
Regards
Peasant.
1 Like
RudiC
October 7, 2018, 3:58am
5
A bit simpler, no memory hogger:
awk -F\| 'NR>1 {if (!X[$7]) X[$7] = ++CNT; print > (X[$7] "_" $7)}' file
If you have a large number of different output files (exceeding system limits) you'll need to append to the files and close them after writing.
2 Likes
Rudi C.
The three output file names have 1_ as prefix , can you please update the code to go in sequential manner i.e. 1_xx, 2_xxx e.t.c
Thanks
Peasant
October 10, 2018, 11:52pm
7
Mine has _1
hard coded.
RudiC code enumerates properly, creating three files from current input.
1_CARE 2_UNIVERSAL 3_ALLIED
Regards
Peasant.
1 Like
Peasant,
Did you try running Rudic's code ?
Thanks
Peasant
October 11, 2018, 12:21am
9
Of course i did.
Just a moment ago, and several days go when i examined it to learn.
Regards
Peasant.
Thank you Rudi C & Peasant