Sort mixed data file

I have a text file and each field is separated by semicolon ( ; ). Field number 7 is internally separated by comma ( , ) and pipe ( | ) symbol. I want to sort file based on three different fields which are marked in BOLD.

Here first BOLD field will have numbers upto the length of 9 characters, second BOLD field will have start date in 'YYYYMMDD' format and third field will have end date in 'YYYYMMDD' format.

Input File:

 
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;GC;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;GC;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;GC;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121201;20130531;GC;2013
 

Required output: First and second field should be in ascending order, and third field should be in descending order.

Output file:

REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121201;20130531;GC;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;GC;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;GC;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;GC;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013

Thanks!

Here is one solution assuming your file will not contain '%':

 awk -F';' '{split($7,a,","); print a[3] "%" $15 "%" $16 "%" $0}' input.txt | sort -t '%' -k 1n -k 2n -k 3nr | awk -F'%' '{print $4}' 
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121201;20130531;GC;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;GC;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;GC;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;GC;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013

It worked. Thanks!

I have a text file and each field is separated by semicolon ( ; ). Field number 7 is internally separated by comma ( , ) and pipe ( | ) symbol. I want to sort file based on four different fields which are marked in BOLD.

Here first BOLD field will have numbers upto the length of 9 characters, second BOLD field will have start date in 'YYYYMMDD' format, third field will have end date in 'YYYYMMDD' format and fourth BOLD field will have string of 2 characters (Like "IP", "GC" ) or two WHITE SPACE.

Input File:

 
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
REQSTS;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
REQSTS;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013
REQSTS;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013 

Required output: Sorting should happen based on following four fields in same sequence:-

(1) Fourth BOLD field should be in decending order (CHARACTER)
(2) First BOLD field should be in ascending order (NUMBER)
(3) Second BOLD field should be in ascending order (NUMBER)
(4) Third BOLD field should be in descending order (NUMBER)

Output file:

REQSTS;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
REQSTS;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
REQSTS;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
REQSTS;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
REQSTS;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
REQSTS;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
REQSTS;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013
REQSTS;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
REQSTS;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013
REQSTS;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013

I tried below command but it is not showing field 17th (i.e., fourth BOLD field) in the output.

 
awk -F';' '{$17 "%" split($7,a,","); print a[3] "%" $15 "%" $16 "%" $0}' temp.dat
 
100010463%20120103%20120401%CLMREQ;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
18663%20130101%20130331%CLMREQ;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
18663%20121201%20130229%CLMREQ;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
18663%20121201%20130101%CLMREQ;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
18663%20130101%20130430%CLMREQ;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
34082%20120203%20120301%CLMREQ;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
18663%20121201%20130131%CLMREQ;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013
100009394%20120103%20120401%CLMREQ;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
18663%20121101%20130531%CLMREQ;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013
100010463%20110203%20110501%CLMREQ;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013

Thanks!

What about analysing and trying to understand mjf's suggestion, and then adapting it YOURSELF to your new specification?

Sorry, I did a mistake here. The awk command should be;

awk -F';' '{split($7,a,","); print $17 "%" a[3] "%" $15 "%" $16 "%" $0}' temp1.dat | sort -t '%' -k 1r -k 2n -k 3n -k 4nr

It gave me following output but it did not sort first bold field correctly. Still trying....

 
IP%34082%20120203%20120301%CLMREQ;00002315000000000013;1366449767;EIN;390806961;;GGGG_SAK,1,34082,Y,0|GGGG_NPI,1,1366489767,N,1|GGGG_ZZZ_ID,1,32562000,N,2;;;1699720086;;;;EP;20120203;20120301;IP;2013
IP%18663%20130101%20130331%CLMREQ;00002315000000000001;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20130101;20130331;IP;2013
IP%18663%20121201%20130229%CLMREQ;00002315000000000002;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130229;IP;2013
IP%18663%20121201%20130101%CLMREQ;00002315000000000022;1316908164;EIN;310806261;;AAAA_SAK,1,18663,Y,0|AAAA_NPI,1,1316998164,N,1|AAAA_TTT_ID,1,30370300,N,2;;;1699720086;;;;EP;20121201;20130101;IP;2013
IP%100009394%20120103%20120401%CLMREQ;00002315000000000009;1174668474;EIN;272042615;;EEEE_SAK,1,100009394,Y,0|EEEE_NPI,1,1174618474,N,1|EEEE_XXX_ID,1,100009394,N,2;;;1699720086;;;;EP;20120103;20120401;IP;2013
GC%18663%20130101%20130430%CLMREQ;00002315000000000003;1720192893;EIN;320806261;;BBBB_SAK,1,18663,Y,0|BBBB_NPI,1,1720172893,N,1|BBBB_UUU_ID,1,100002999,N,2;;;1699720086;;;;EP;20130101;20130430;GC;2013
GC%100010463%20120103%20120401%CLMREQ;00002315000000000011;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20120103;20120401;GC;2013
GC%100010463%20110203%20110501%CLMREQ;00002315000000000016;1548356967;EIN;390606261;;FFFF_SAK,1,100010463,Y,0|FFFF_NPI,1,1548356967,N,1|FFFF_YYY_ID,1,100010463,N,2;;;1699720086;;;;EP;20110203;20110501;GC;2013
  %18663%20121201%20130131%CLMREQ;00002315000000000005;1003888704;EIN;390836261;;CCCC_SAK,1,18663,Y,0|CCCC_NPI,1,1003868704,N,1|CCCC_VVV_ID,1,34394500,N,2;;;1699720086;;;;EP;20121201;20130131;  ;2013
  %18663%20121101%20130531%CLMREQ;00002315000000000007;1992777660;EIN;394806261;;DDDD_SAK,1,18663,Y,0|DDDD_NPI,1,1992757660,N,1|DDDD_WWW_ID,1,31598400,N,2;;;1699720086;;;;EP;20121101;20130531;  ;2013

Thanks!

You should modify the sort parameters:

sort -t '%' -k1,1r -k2,2n -k3,3n -k4,4nr

-k2n starts a key (field) 2 and runs to the end of line.
-k2,2 starts at 2 and stops at 2.