awk -F_ ' # set field separator to underscore
{
A[$1 FS $2]++ # count the number of times $1 FS $2 occurs (field 1 and field2 separated by an underscore (for example "data_1")
}
END{
for(i in A) print i, A # At the end print the results
}
'
The code has worked impressively, many thanks for that.
I want to write a line to determine that, if the number of files with any prefix is more than 5, then print out the prefix names in one line (separated by a single space), such as
data_1 data_2
I wrote this line
ls | awk -F_ '{A[$1 FS $2]++} END {for (j in A) {if (A[j] > 5) {printf j, " "}}}'
However the output from this line is
data_1data_2
It doesn't seem to recognise the single space I asked for between the prefix. Do you know what I may have done wrong?
But while waiting for your reply I also found that if I remove the "," in the printf argument in my original code, so that
ls | awk -F_ '{A[$1 FS $2]++} END {for (j in A) if (A[j] > 5) printf j " "}'
It worked, which is against I have read in the syntaxing of the awk/printf code. Don't know, something to do with the shell (zsh) I am using or other reason I don't understand.
Yes, in that cat j an " " are concatenated, so printf then uses the resulting string as a single argument. However, I would not recommend using printf with data in the format field.
I can't seem to get the printf to work probably. For example
ls | awk -F_ '{A[$1 FS $2]++} END {for (j in A) print j, A[j]}'
would output
data_1 200
data_2 34
while the equivalent command with printf
ls | awk -F_ '{A[$1 FS $2]++} END {for (j in A) printf j, A[j]}'
would only output
data_1data_2
while ignoring the A[j]
The reason why I want to do this is because I want to line up the output nicer, as at the moment for my test directory I am getting (while using the \t key)
loooooooooger_prefix1 200
shorter_prefix2 34
but I want to get
loooooooooger_prefix1 200
shorter_prefix2 34
---------- Post updated at 10:23 AM ---------- Previous update was at 10:21 AM ----------
sorry my message format wasnt displayed probably
but I want to output the prefix and the number of files with each prefix aligned in a column. Using the \t doesn't work too well if the prefix has different length.