Shell script for field wise record count for different Files .csv files

Hi,

Very good wishes to all!

Please help to provide the shell script for generating the record counts in filed wise from the .csv file

My question:

Source file:

Field1 Field2 Field3
abc 12f sLm
1234 hjd 12d
Hyd 34
Chn

My target file should generate the .csv file with the record count on field wise as below:

Target File:

Field1 Field2 Field3
4 3 2

Field1 has 4 values and field2 has 3 values and field3 has two values so the target file should be as below.

My source file can changes with more columns or less, so the target file should also have the same number of columns with the record count. Please help to resolve this..

Regards,
Kiran.

Hello Kirands,

Please use code tags for code/Input/commands which you are using into you posts as per forum rules. Following may help you in same.

awk 'NR==1{print;next} {for(i=1;i<=NF;i++){array++}} END{for(j in array){A=A?A "\t" array[j]:array[j]};print A}'   Input_file
 

Output will be as follows.

Field1 Field2 Field3
4       3       2

Thanks,
R. Singh

1 Like

Hi Ravinder,

Thank you so much for your response.
The data is looking good when i run the code given by you.. can you please help to put that data in the .csv file with the comma separator.

Regards,
Kiran.

Hello Kirands,

Sorry forgot to mention that previously, following may help you in same.

awk 'NR==1{OFS=",";for(i=1;i<=NF;i++){Q=Q?Q OFS $i:$i}print Q;next} {for(i=1;i<=NF;i++){array++}} END{for(j in array){A=A?A OFS array[j]:array[j]};print A}' OFS=,  Input_file > Output.csv

Thanks,
R. Singh

1 Like

Hi Ravinder,

Thank you for your prompt response.

I have tested the code given by you, but its still have some issues in giving the correct record count.

Please find the attached source file that has the data and the two headers., the record count should be from the 3rd row.
Please help to resolve this issue.

Thanks for your support.

Regards,
Kiran.

Hello Kirands,

I am sorry, I couldn't open the .rar file. According to your post you need to leave first two lines then let's say we have following Input_file:

Field1 Field2 Field3
XYZ ABC DEF
abc 12f sLm
1234 hjd 12d
Hyd 34
Chn

Then a little change in my previous post code as follows.

awk 'NR==1 || NR==2{OFS=",";for(i=1;i<=NF;i++){Q=Q?Q OFS $i:$i}print Q;Q="";next} {for(i=1;i<=NF;i++){array++}} END{for(j in array){A=A?A OFS array[j]:array[j]};print A}' OFS=,  Input_file > output.csv
 

Let me know if you have any queries here.

Thanks,
R. Singh

1 Like

Hi Ravindra,

Thank you for your response.

I have executed the latest code send by you :

awk 'NR==1 || NR==2{OFS=",";for(i=1;i<=NF;i++){Q=Q?Q OFS $i:$i}print Q;Q="";next} {for(i=1;i<=NF;i++){array++}} END{for(j in array){A=A?A OFS array[j]:array[j]};print A}' OFS=,  RecountCountTest.csv> example_4.csv

The output data is giving correct for first 4 columns and the last 5 columns are not giving the correct record count, below are the source file and the target file(after i executed the above code) for your reference:

Source file:
First two are the headers and should not be in the count working good:

Target file after i executed the code provided by you:

for the columns NICKNAME count should be 0

Please help to resolve this issue, thanks for all your support.

Regards,
Kiran.

---------- Post updated at 07:52 AM ---------- Previous update was at 07:49 AM ----------

Hi Ravinder,

Here is the source data as per my previous reply if you did not able to see the data:

Source data:

STATUS USERID USERNAME FIRSTNAME NICKNAME MI LASTNAME SUFFIX TITLE
STATUS USERID Username FirstName Nickname MiddleName LastName Suffix Title
ACTIVE 80001001 itria1 Carlos Guillermo Itria
ACTIVE 80001002 lafournere1 Jean-Paul
ACTIVE 80001003 Wade Peter
ACTIVE 80001004 Olivier

Target Data: (after i executed the code provided by you:)

STATUS USERID USERNAME FIRSTNAME NICKNAME MI LASTNAME SUFFIX TITLE
STATUS USERID Username FirstName Nickname MiddleName LastName Suffix Title
4 4 2 4 4 4

Regards,
Kiran.

Hello Kirands,

Thank you for posting the sample data, request you to please use code tags like I am using as follows for commands into your posts as per forum rules. If you need to show the columns which have NULL values too then following may help.

awk 'NR==1 || NR==2{n=NF;OFS=",";for(i=1;i<=NF;i++){Q=Q?Q OFS $i:$i}print Q;Q="";next} {for(i=1;i<=NF;i++){array++}} END{for(j=1;j<=n;j++){A=A?A OFS array[j]:array[j]};if(A){print A;A=""}}' OFS=,   Input_file

Output will be as follows.

STATUS,USERID,USERNAME,FIRSTNAME,NICKNAME,MI,LASTNAME,SUFFIX,TITLE
STATUS,USERID,Username,FirstName,Nickname,MiddleName,LastName,Suffix,Title
4,4,4,3,1,1,,,
 

If you don't need the columns which have entirely NULL values in Input_file then following may help you in same.

awk 'NR==1 || NR==2{n=NF;OFS=",";for(i=1;i<=NF;i++){Q=Q?Q OFS $i:$i}print Q;Q="";next} {for(i=1;i<=NF;i++){array++}} END{for(j=1;j<=n;j++){if(array[j]){A=A?A OFS array[j]:array[j]}};print A;A=""}' OFS=,   Input_file
 

Output will be as follows.

STATUS,USERID,USERNAME,FIRSTNAME,NICKNAME,MI,LASTNAME,SUFFIX,TITLE
STATUS,USERID,Username,FirstName,Nickname,MiddleName,LastName,Suffix,Title
4,4,4,3,1,1
 

Also you could redirect output of these above commands to a Output_file.csv file too, hope this helps you.

Thanks,
R. Singh

As the sequence of records is undefind for the (j in array) construct, and, as empty/blank fields should not be counted, this might be better fit:

awk '
NR<=2   {print
         next
        }
        {for (i=1; i<=NF; i++) array += ($i !~ /^ *$/)
         if (NF > MX) MX = NF
        }
END     {for (i=1; i<=MX; i++)  printf "%s%s", array+0, (i!=MX)?OFS:ORS
        }
' FS=, OFS=, /tmp/RecoundCountTest.csv

Hi Ravindra,

Thanks for your response.

Below is the source data:

STATUS,USERID,USERNAME,FIRSTNAME,NICKNAME,MI,LASTNAME,SUFFIX,TITLE
STATUS,USERID,Username,FirstName,Nickname,MiddleName,LastName,Suffix,Title
ACTIVE,80001001,itria1,Carlos Guillermo, , ,Itria, , 
ACTIVE,80001002,lafournere1,Jean-Paul, , ,, , 
ACTIVE,80001003,,Wade Peter, , ,, , 
ACTIVE,80001004,,Olivier, , ,, , 

Could you please have a look on this and please provide me the code to get the record count:
I have checked the latest code provided, but could not give me correct results. The above source data is now separated with the columns with the data having nulls with the commas (,)..

My output should be as below :

STATUS,USERID,USERNAME,FIRSTNAME,NICKNAME,MI,LASTNAME,SUFFIX,TITLE
STATUS,USERID,Username,FirstName,Nickname,MiddleName,LastName,Suffix,Title
4,4,2,4,0,0,1,0,0

Please help to resolve.. Thank you for your support.

Regards,
Kiran.

---------- Post updated at 09:04 AM ---------- Previous update was at 08:22 AM ----------

Hi Rudic,

Thank you for your response.

The code provide by you is giving the correct but the last column output result is not giving the correct record count.

Source file is below :

STATUS,USERID,USERNAME,FIRSTNAME,NICKNAME,MI,LASTNAME,SUFFIX,TITLE
STATUS,USERID,Username,FirstName,Nickname,MiddleName,LastName,Suffix,Title
ACTIVE,80001001,itria1,Carlos Guillermo, , ,Itria, , 
ACTIVE,80001002,lafournere1,Jean-Paul, , ,, , 
ACTIVE,80001003,,Wade Peter, , ,, , 
ACTIVE,80001004,,Olivier, , ,, , 

Target file after running the code provided by you is giving as below :

STATUS,USERID,USERNAME,FIRSTNAME,NICKNAME,MI,LASTNAME,SUFFIX,TITLE
STATUS,USERID,Username,FirstName,Nickname,MiddleName,LastName,Suffix,Title
4,4,2,4,0,0,1,0,4

The last columns count is giving as 4 which is incorrect.

Expected result is as below : (last column should be 0 as count)

STATUS,USERID,USERNAME,FIRSTNAME,NICKNAME,MI,LASTNAME,SUFFIX,TITLE
STATUS,USERID,Username,FirstName,Nickname,MiddleName,LastName,Suffix,Title
4,4,2,4,0,0,1,0,0

Please help to resolve this issues.

Thanks you,
Kiran.

Reminder:
We are not here to do the work for you, so far in this thread you have not shown what you have tried before opening the thread...
So before posting

You should try to understand what has been posted and maybe ask what bit isnt clear for you as you are trying to adapt but face an issue

1 Like

The last columns count is giving as 4 which is absolutely correct as your file has DOS line terminators 0x0D (or <CR> or \r) that result in a non-empty field which is to be counted. Remove those to make it a correct *nix text file, and it will fly.

1 Like

Dear Vbe,

I really tried and did more analysis and executing the code given by expertise, going forward i will sure mention my analysis and understanding to the expertise.

Thank you, Kiran.

In addition to CR/LF line terminators instead of <newline> line terminators, note also that when the field separator is a comma, a <space> in a field is not an empty field. So, for instance, the line in the sample input:

ACTIVE,80001002,lafournere1,Jean-Paul, , ,, , 

when expanded so we can see all of the spaces:

printf '%s' 'ACTIVE,80001002,lafournere1,Jean-Paul, , ,, , '|od -bc 
0000000   101 103 124 111 126 105 054 070 060 060 060 061 060 060 062 054
           A   C   T   I   V   E   ,   8   0   0   0   1   0   0   2   ,
0000020   154 141 146 157 165 162 156 145 162 145 061 054 112 145 141 156
           l   a   f   o   u   r   n   e   r   e   1   ,   J   e   a   n
0000040   055 120 141 165 154 054 040 054 040 054 054 040 054 040        
           -   P   a   u   l   ,       ,       ,   ,       ,            
0000056

only has one empty field (noted by the 2 adjacent commas marked in red). The last field on that line contains a <space> character and is, therefore, not empty.

... which I accounted for with /^ *$/ ...