To find record having null value

Hi All

My requirement is to find the null values in particular column of a file and reject it in case if it contains null values. But the challenge is that I want a common command which can be used across different file, as the position of the column we need to check for different file may get vary.

For ex. in the first file we need to check null values in 2nd & 3rd column, where as in the second file we need to check in 4th, 5th & 6th column.All my files are pipe delimited

Thanks in advance

Hello ginrkf,

I have a few to questions pose in response first:-

  • Is this homework/assignment? There are specific forums for these.
  • What have you tried so far?
  • What output/errors do you get?
  • What OS and version are you using?
  • What are your preferred tools? (C, shell, perl, awk, etc.)
  • What logical process have you considered? (to help steer us to follow what you are trying to achieve)

Most importantly, What have you tried so far?

There are probably many ways to achieve most tasks, so giving us an idea of your style and thoughts will help us guide you to an answer most suitable to you so you can adjust it to suit your needs in future.

We're all here to learn and getting the relevant information will help us all.

Regards,
Robin

Hi Rbatte

Sorry for not providing much details on this. Please see my response below.

Is this homework/assignment? There are specific forums for these.
--> This is project requirement for me, where I need to implement for all the incoming files
What have you tried so far?
--> I tried different by using awk command

awk -F ',' $2!=''||$3!=''

But the problem is that the position of the column where I need to check null values vary.More over I am not that expert in UNIX.

What output/errors do you get?

--> I got my desired o/p, but which can used for only for specific file.I am not able to come up with a shared command which can be used across all the files in such a way that, we can pass the column position as parameter.
What OS and version are you using?

--> I am using UNIX

What are your preferred tools? (C, shell, perl, awk, etc.)

--> awk

Hopes this helps..Kindly let me know if you need more detail

Please, could you provide a realistic input sample of your file?
Please, could you provide a sample of the kind of output you want based on that input file?

Hi Please find my sample file

File1

123,asd,fgh,nmb,pok
134,,fg,mnm,mnb

For the above file I need to reject the second record as the second column contain null value.

File2

1235,afg,fgdf,nmbs,posk
1345,,fg,,mnsb

For file 2 by requirement is different where I need to reject a record where 2nd and 4th column is null

So I want to write a unique command where I can pass the column position to check where it is null or not.For ex. I want to pass column position as 2 for the first file and column position as 2&4 for the second file

Hello ginrkf,

From this awk -F ',' $2!=''||$3!='' I can see that you are separating fields on the comma but then I get confused. I think you test that the 2nd field is not null OR the 3rd field is not null.

What is the end goal? I'm wondering if a grep with an extended regular expression might be better, something like this:-

grep -Ev "^[0-9]*,," file1

This is excluding ( -v ) lines that match the pattern "Any line starting with zero or more digits followed by two commas (i.e. a null value)"

If that's okay, then we could do something similar for file2, but I'd rather not leap into that if the above is not what you want. The rules can be adjusted to take account of your real data, of course.

Does this help?

Robin

Try:

awk -F, 'FNR==1{split(c,C)} {p=0; for(i in C) if ($C!="") p=1} p' c=2 file1 c=2,4 file2
1 Like

Hello grinkrf,

Could you please try following too once and let me know if this helps you.
You could dynamically put as per file like whichever field you want to check.

awk -F"," 'function verify(var){num=split(var, A);for(i in A){if($A){p++}};if(num==p){print};p=""} FILENAME=="Input_file1"{verify("2");} FILENAME=="Input_file2"{verify("2,4")}'  Input_file1   Input_file2

Output will be as follows.

123,asd,fgh,nmb,pok
1235,afg,fgdf,nmbs,posk

EDIT: Adding a non-one liner form of solution now too.

awk -F"," 'function verify(var){
                                        num=split(var, A);
                                        for(i in A){
                                                        if($A){
                                                                        p++
                                                                 }
                                                   };
                                        if(num==p) {
                                                        print
                                                   };
                                        p=""
                               }
           FILENAME=="Input_file1"   {
                                        verify("2");
                               }
           FILENAME=="Input_file2"   {
                                        verify("2,4")
                               }
          '  Input_file1  Input_file2

Thanks,
R. Singh

1 Like

Unfortunately, you did not provide a sample of the desired output and your words leave a lot to interpretation. The input provided is quite ambiguous, as well, due to the skimpiness of the sample.
In post #1 you said:

Which make the examples not realistic neither.

A record normally is a string of characters terminated at the new line (a full human line). Is that the same way you are interpreting it? Do you want the resulted output to exclude any lines that match your criteria?
In your comment "2nd and 4th column is null":
Is your criteria for File2 to exclude the lines that have BOTH fields 2 AND 4 empty or to exclude lines that have EITHER fields 2 OR 4 empty?

Thanks to all for your inputs.The below code is working

awk -F"|" 'function verify(var){
                                        num=split(var, A);
                                        for(i in A){
                                                        if($A){
                                                                        p++
                                                                 }
                                                   };
                                        if(num==p) {
                                                        print
                                                   };
                                        p=""
                               }
           FILENAME=="sample_file"   {
                                        verify("2");
                               }
                                          
          '  sample_file

The result which the above code is giving is, printing all the records where the column 2 does not contain any empty value.Thanks RavinderSingh .

I need some more modification with the above code in such a way that correct records should print to one file and incorrect records to other file.Kindly help in modifying the above code to achieve this functionality also.

Also I want to parameterise the column position and file name, so that I can create a script and pass the file name and column position as values dynamically.

Hello ginrkf,

Could you please try following and let me know if this helps you, I haven't tested it though.

awk -F"," 'function verify(var){
                                        num=split(var, A);
					print FILENAME " is getting processed now..." >> "correct_records"
					print FILENAME " is getting processed now..." >> "Incorrect_records"
                                        for(i in A){
                                                        if($A){
                                                                        p++
                                                                 }
                                                   };
                                        if(num==p) {
                                                        print >> "correct_records"
                                                   };
					else       {
							print >> "Incorrect_records"
						   };
                                        p=""
					print "*******************" >> "correct_records"
					print "*******************" >> "Incorrect_records"
                               }
           FILENAME=="Input_file1"   {
                                        verify("2");
                               }
           FILENAME=="Input_file2"   {
                                        verify("2,4")
                               }
          '  Input_file1  Input_file2

If you need any additional things in code then kindly show us the sample output for same too.

Thanks,
R. Singh

Hi Ravinder

The code is showing me some syntax error.

awk: cmd. line:12:                                      else       {
awk: cmd. line:12:                                      ^ syntax error

Also my expectation is that I will be writing this to a test_script.sh

awk -F"," 'function verify(var){
                                        num=split(var, A);
					print FILENAME " is getting processed now..." >> "correct_records"
					print FILENAME " is getting processed now..." >> "Incorrect_records"
                                        for(i in A){
                                                        if($A){
                                                                        p++
                                                                 }
                                                   };
                                        if(num==p) {
                                                        print >> "correct_records"
                                                   };
					else       {
							print >> "Incorrect_records"
						   };
                                        p=""
					print "*******************" >> "correct_records"
					print "*******************" >> "Incorrect_records"
                               }
           FILENAME=="$1"   {
                                        verify("$2");
                               }
           
          '  $1

So that while calling the test_script.sh script I will pass the file name for $1 and column position for $2 as given below

sh test_script.sh SAMPLE_FILE 2

Hello ginrkf,

Could you please run following in a script form it should work then as follows.

awk -F"," 'function verify(var){
                                        num=split(var, A);
                                        for(i in A){
                                                        if($A){
                                                                        p++
                                                                 }
                                                   };
                                        if(num==p) {
                                                         print $0 " with file name: " FILENAME>> "correct_lines"
                                                   };
                                        if(num!=p) {
                                                        print $0 " with file name: " FILENAME >> "incorrect_lines"
                                                   }
                                        p=""
                               }
           FILENAME=="Input_file1"   {
                                        verify("2");
                               }
           FILENAME=="Input_file2"   {
                                        verify("2,4")
                               }
          '   Input_file1    Input_file2
 

Output came for above is as follows.

cat incorrect_lines
134,,fg,mnm,mnb with file name: file1
1345,,fg,,mnsb with file name: file2

cat correct_lines
123,asd,fgh,nmb,pok with file name: file1
1235,afg,fgdf,nmbs,posk with file name: file2

Thanks,
R. Singh

Hi,

I have modified the same script in such way that I can pass the value as an external parameter while running.

vi test_script.sh
a=$1
awk -F"|" 'function verify(var){
                                        num=split(var, A);
                                        for(i in A){
                                                        if($A){
                                                                        p++
                                                                 }
                                                   };
                                        if(num==p) {
                                                         print $0 " with file name: " FILENAME>> "correct_lines"
                                                   };
                                        if(num!=p) {
                                                        print $0 " with file name: " FILENAME >> "incorrect_lines"
                                                   }
                                        p=""
                               }
           FILENAME=="SAMPLE_FILE"   {
                                        verify("$a");
                               }

          '   SAMPLE_FILE
sh test_script.sh 2

But in this case its not working.if we hard code the value there its working as expected.

Hello ginrkf,

So in awk if you want to pass shell arguments you can't pass those values like that, could you please try following(I haven't tested though).

cat script.ksh
a=$1
awk -vval="$a" -F"|" 'function verify(var){
                                        num=split(var, A);
                                        for(i in A){
                                                        if($A){
                                                                        p++
                                                                 }
                                                   };
                                        if(num==p) {
                                                         print $0 " with file name: " FILENAME>> "correct_lines"
                                                   };
                                        if(num!=p) {
                                                         print $0 " with file name: " FILENAME >> "incorrect_lines"
                                                   }
                                        p=""
                               }
           FILENAME=="SAMPLE_FILE"   {
                                        verify(val);
                               }
           '   SAMPLE_FILE

Also you have put -F"|" in above code just wanted to tell you it means you are setting delimiter as | but in your sample Input_file I couldn't see that, so if you have different Input_file which is having | as delimiter then it is ok else we have to see how the Input_file looks like.

Thanks,
R. Singh