I have /tmp dir with filename as:
010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker
010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker
i want to sort these files based on first 5 columns and then remove the duplicates based on those same first 5 columns:
i tried below code:
ls | sort -k1,2,3,4,5
later on i felt, there is no need to sort my files just remove the duplicates as i need only unique names, order doesn't matter, so i tried this:
ls | awk -F[_-] '!seen[$1,$2,$3,$4,$5]++'
i got:
010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker
If you see closely i am missing one file: i.e
010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker
please note the field separator in first 5 columns.
so my desired output should be :
010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker
help me out on this, also i want to run the for loop on the desired result set..so shall i delete the duplicate filenames or store the unique filenames at some other directory and then run for loop, need some kind of advise.
TIA