Returning specific columns upon matching

vamsikrishna928 · September 18, 2014, 12:23pm

Hi All,

Need help in this requirement.

I have fileA with one column and fileB with 26 columns.

I need to match the value from fileA with fileB, if matches I have to return that value from fileB, and the next value, 5th and 6th values.

NOTE- the matching value's position changes in FileB.

For example,

FileA:

abc
def

FileB:

abc|123|xyz|789|jkl|345|sez|435|.....|367 (26 values)

Since abc is a match, output should have

abc|123|jkl|345

If matched value's position is 'x' in fileB, output file should have

x|x+1|x+4|x+5

How can I achieve this in Unix? Please help. Thanks in advance.

Scrutinizer · September 18, 2014, 12:34pm

Try:

awk 'NR==FNR{A[$1]; next} {for(i=1; i<=NF-5; i++) if($i in A) print $i, $(i+1), $(i+4), $(i+5)}' fileA FS=\| OFS=\| fileB

vamsikrishna928 · September 18, 2014, 8:41pm

Thanks for the reply..

Unfortunately, this script is not giving results. The output file is empty. Here is the script I use. Please correct me if there is anything wrong.

dos2unix fileA
dos2unix fileB
awk 'NR==FNR{A[$1]; next} {for(i=1; i<=NF-5; i++) if($i in A) print $i, $(i+1), $(i+4), $(i+5)}' fileA FS=\| OFS=\| fileB > FileC

I need to get the result set to the new file FileC. Also, if the delimiter in fileB is comma instead of pipeline, where should I need to change..?

Thanks in advance.

Don_Cragun · September 18, 2014, 9:32pm

vamsikrishna928:

Thanks for the reply..

Unfortunately, this script is not giving results. The output file is empty. Here is the script I use. Please correct me if there is anything wrong.
dos2unix fileA
dos2unix fileB
awk 'NR==FNR{A[$1]; next} {for(i=1; i<=NF-5; i++) if($i in A) print $i, $(i+1), $(i+4), $(i+5)}' fileA FS=\| OFS=\| fileB > FileC
I need to get the result set to the new file FileC. Also, if the delimiter in fileB is comma instead of pipeline, where should I need to change..?

Thanks in advance.

Is the text marked in red above your way of saying: "I'm sorry I gave you bad information about my input file format; the field separator is a comma instead of a vertical bar. Can you please help me fix the code you gave me because I made a mistake?"

If that is what you intended to say, you could try something like this instead:

awk '
{       gsub(/\r/, "")
}
NR==FNR{A[$1]
        next
}
{       for(i=1; i<=NF-5; i++) 
                if($i in A)
                        print $i, $(i+1), $(i+4), $(i+5)
}' fileA FS=',' OFS=',' fileB > FileC

Note that the call to gsub() takes care of the carriage return removal from both of your input files so you no longer need to invoke dos2unix twice.

vamsikrishna928 · September 18, 2014, 9:46pm

Hi Don,

The source file delimiter is changed from pipeline to comma. Thanks a lot for this code, it worked like a magic Thanks again!

Don_Cragun · September 18, 2014, 9:52pm

Thank Scrutinizer. It was his code (with very slight modifications to change to your new field separators and to get rid of the need for the two calls to dos2unix ) that solved your problem!

vamsikrishna928 · September 18, 2014, 9:55pm

You are correct, thanks to Scrutinizer as well!

MadeInGermany · September 19, 2014, 4:27am

sub(/\r$/, "")

(remove one \r at the end of the line)
is faster and more precise than

gsub(/\r/, "")

(remove all \r in the line)