How do I include the file being compared into calculation?

nawk -F, 'NR==FNR{file=FILENAME;a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{if(FILENAME~file)next;b[$1OFS$2OFS$3]++;}
END{ for(i in a){if(a && !b){print "NEW: "i}} for(i in b){if(b)print i"\t\t"b}}' OFS=, 123.csv *.csv

I need to include 123.csv into the equation for the total output currently it compares whatever is on 123.csv against everything else and prints the results but I need it to include itself in the total i.e

NE:20300468,SHELF:5,SLOT:5              21

currently the above output is excluding the 123.csv file

:wall::wall::wall:

---------- Post updated at 01:28 PM ---------- Previous update was at 05:59 AM ----------

Please help?

nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++;}
END{ for(i in a){if(a && !b){print "NEW: "i}} for(i in b){if(b)print i"\t\t"b}}' OFS=, 123.csv *.csv

--ahamed

1 Like

Original Script

#nawk -F, 'NR==FNR{file=FILENAME;a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{if(FILENAME~file)next;b[$1OFS$2OFS$3]++;}
 END{ for(i in a){if(a && !b){print "NEW: "i}} for(i in b){if(b)print i"\t\t"b}}' OFS=, 20111125.csv *.csv
NEW: NE:564867,SHELF:10,SLOT:1
NEW: NE:565229,SHELF:4,SLOT:3
NEW: NE:507423,SHELF:6,SLOT:5
NEW: NE:508089,SHELF:7,SLOT:6
NEW: NE:557688,SHELF:10,SLOT:1
NEW: NE:985068,SHELF:2,SLOT:6
NEW: NE:503703,SHELF:16,SLOT:2
NEW: NE:249454,SHELF:2,SLOT:2
NE:556416,SHELF:9,SLOT:1                4
NE:565229,SHELF:7,SLOT:6                2
NE:20173322,SHELF:3,SLOT:1              1
NE:572549,SHELF:8,SLOT:6                5
NE:600866,SHELF:8,SLOT:3                17
NE:508089,SHELF:1,SLOT:6                1
NE:991626,SHELF:4,SLOT:3                3
NE:20159466,SHELF:6,SLOT:2              3
NE:508539,SHELF:7,SLOT:4                2
NE:506443,SHELF:17,SLOT:4               2
NE:20173322,SHELF:2,SLOT:5              1
NE:230388,SHELF:10,SLOT:2               1
NE:557688,SHELF:3,SLOT:1                3
NE:556029,SHELF:5,SLOT:2                1
NE:503284,SHELF:6,SLOT:3                13
NE:597829,SHELF:8,SLOT:3                3

Second script

nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++;}
> END{ for(i in a){if(a && !b){print "NEW: "i}} for(i in b){if(b)print i"\t\t"b}}' OFS=, 20111125.csv *.csv
NE:985068,SHELF:2,SLOT:6                1
NE:556416,SHELF:9,SLOT:1                5
NE:565229,SHELF:7,SLOT:6                3
NE:20173322,SHELF:3,SLOT:1              2
NE:572549,SHELF:8,SLOT:6                7
NE:600866,SHELF:8,SLOT:3                18
NE:249454,SHELF:2,SLOT:2                1
NE:503703,SHELF:16,SLOT:2               1
NE:508089,SHELF:1,SLOT:6                2
NE:991626,SHELF:4,SLOT:3                4
NE:20159466,SHELF:6,SLOT:2              4
NE:508539,SHELF:7,SLOT:4                3
NE:506443,SHELF:17,SLOT:4               3
NE:20173322,SHELF:2,SLOT:5              2
NE:230388,SHELF:10,SLOT:2               2
NE:507423,SHELF:6,SLOT:5                1
NE:557688,SHELF:3,SLOT:1                4
NE:556029,SHELF:5,SLOT:2                3
NE:508089,SHELF:7,SLOT:6                1
NE:503284,SHELF:6,SLOT:3                14
NE:565229,SHELF:4,SLOT:3                1
NE:557688,SHELF:10,SLOT:1               1
NE:597829,SHELF:8,SLOT:3                5
NE:564867,SHELF:10,SLOT:1               1

Just need the 1's to be displayed as "NEW"

nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++}
 END{for(i in b){if(b-1){print i"\t\t"b-1}else{print "NEW :"i} } }' OFS=,20111125.csv *.csv | sort

--ahamed

1 Like

output

#nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++}
 END{for(i in b){if(b-1){print i"\t\t"b-1}else{print "NEW :"i} } }' OFS=, 20111127.csv *.csv | sort -r
NEW :NE:571209,SHELF:18,SLOT:6
NEW :NE:566030,SHELF:2,SLOT:6
NEW :NE:564588,SHELF:5,SLOT:6
NEW :NE:556029,SHELF:8,SLOT:5
NEW :NE:510150,SHELF:9,SLOT:5
NEW :NE:508622,SHELF:10,SLOT:1
NEW :NE:20107650,SHELF:2,SLOT:4
NE:985068,SHELF:6,SLOT:4                1
NE:985068,SHELF:4,SLOT:1                2
NE:600866,SHELF:8,SLOT:3                20
NE:571209,SHELF:6,SLOT:1                51
NE:571209,SHELF:3,SLOT:3                14
NE:571209,SHELF:18,SLOT:2               1
NE:565808,SHELF:3,SLOT:4                1
NE:565229,SHELF:14,SLOT:1               2
NE:556029,SHELF:5,SLOT:2                3
NE:503284,SHELF:6,SLOT:3                14
NE:321636,SHELF:16,SLOT:1               4
NE:249314,SHELF:8,SLOT:1                6
NE:230388,SHELF:10,SLOT:2               2
NE:222268,SHELF:7,SLOT:6                34
NE:20173322,SHELF:5,SLOT:5              2
NE:20170632,SHELF:3,SLOT:3              12

it still shows the 1's it should be 2 does not look like it is including 20111127.csv file in calculation

The entries with the count 1 are not new, they might be present in the other files and is present once. Please go through the code and experiment it.

--ahamed

ok here's a comparison

nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++;}
 END{for(i in b){if(b-1>0){print i"\t\t"b-1}else{print "NEW :"i} } }' OFS=, 20111127.csv *.csv | sort -r
NEW :NE:571209,SHELF:18,SLOT:6
NEW :NE:566030,SHELF:2,SLOT:6
NEW :NE:564588,SHELF:5,SLOT:6
NEW :NE:556029,SHELF:8,SLOT:5
NEW :NE:510150,SHELF:9,SLOT:5
NEW :NE:508622,SHELF:10,SLOT:1
NEW :NE:20107650,SHELF:2,SLOT:4
NE:985068,SHELF:6,SLOT:4                1
NE:985068,SHELF:4,SLOT:1                2
NE:600866,SHELF:8,SLOT:3                20
NE:571209,SHELF:6,SLOT:1                51
NE:571209,SHELF:3,SLOT:3                14
NE:571209,SHELF:18,SLOT:2               1
NE:565808,SHELF:3,SLOT:4                1
NE:565229,SHELF:14,SLOT:1               2
NE:556029,SHELF:5,SLOT:2                3
NE:503284,SHELF:6,SLOT:3                14
NE:321636,SHELF:16,SLOT:1               4
NE:249314,SHELF:8,SLOT:1                6
NE:230388,SHELF:10,SLOT:2               2
NE:222268,SHELF:7,SLOT:6                34
NE:20173322,SHELF:5,SLOT:5              2
NE:20170632,SHELF:3,SLOT:3              12


nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++;}
END{ for(i in a){if(a && !b){print "NEW: "i}} for(i in b){if(b)print i"\t\t"b}}' OFS=, 20111127.csv *.csv
NE:556029,SHELF:8,SLOT:5                1
NE:566030,SHELF:2,SLOT:6                1
NE:222268,SHELF:7,SLOT:6                35
NE:564588,SHELF:5,SLOT:6                1
NE:20107650,SHELF:2,SLOT:4              1
NE:510150,SHELF:9,SLOT:5                1
NE:565808,SHELF:3,SLOT:4                2
NE:565229,SHELF:14,SLOT:1               3
NE:600866,SHELF:8,SLOT:3                21
NE:20170632,SHELF:3,SLOT:3              13
NE:985068,SHELF:4,SLOT:1                3
NE:571209,SHELF:3,SLOT:3                15
NE:571209,SHELF:6,SLOT:1                52
NE:321636,SHELF:16,SLOT:1               5
NE:571209,SHELF:18,SLOT:2               2
NE:508622,SHELF:10,SLOT:1               1
NE:230388,SHELF:10,SLOT:2               3
NE:571209,SHELF:18,SLOT:6               1
NE:556029,SHELF:5,SLOT:2                4
NE:503284,SHELF:6,SLOT:3                15
NE:20173322,SHELF:5,SLOT:5              3
NE:249314,SHELF:8,SLOT:1                7
NE:985068,SHELF:6,SLOT:4                2

the file 20111127.csv is not being included in the calculation for the first script, where as the second is perfect just need the counts showing as 1 to be called NEW as well as showing 1

Like I said, the entries with the count 1 and without the NEW tag are not new. They are old entries present in 20111127.csv and other files.

Please paste the output of

grep "NE:985068,SHELF:6,SLOT:4" *.csv

--ahamed

agree, but I need the 1's to show the true reflection i.e it should show as 2 as it's also seen this reset in the 20111127.csv file

nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++;} END{ for(i in a){if(a && !b){print "NEW: "i}} for(i in b){if(b)print i"\t\t"b}}' OFS=, 20111127.csv *.csv

above script includes 20111127.csv file in calculation but not your latest one

nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++}
 END{for(i in b){if(b-1){print i"\t\t"b}else{print "NEW :"i} } }' OFS=,20111125.csv *.csv | sort

--ahamed

1 Like

works alleluia !!

---------- Post updated at 12:01 PM ---------- Previous update was at 11:48 AM ----------

just need NEW: lines to represent \t\t 1 now

Please show some effort!

--ahamed

1 Like

:wall::wall::wall:

ok I need to print output "\t\t 1" not sure where on the else{print "NEW:"i} statement

help

{print "NEW:"i"\t\t1"}

--ahamed

1 Like

how can I print off the filename as the first line ?

nawk -F, 'NR==FNR{a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++} v!~FILENAME{f=f" "FILENAME;v=FILENAME}
END{print f;for(i in b){if(b-1){print i"\t\t"b}else{print "NEW :"i} } }' OFS=,20111125.csv *.csv | sort

--ahamed

1 Like

that just prints off all the filenames, not what I wanted

Which filename do you want?

May be this?

nawk -F, 'NR==FNR{f=FILENAME;a[$1OFS$2OFS$3]++;next} a[$1OFS$2OFS$3]{b[$1OFS$2OFS$3]++}
END{print f;for(i in b){if(b-1){print i"\t\t"b}else{print "NEW :"i} } }' OFS=, 20111125.csv *.csv | sort

--ahamed

1 Like

20111125.csv as the first line

Check my previous post!

--ahamed

1 Like