Number of words in line, while loop, search and grep

baris35 · September 25, 2018, 5:25am

Hello,
What I wish to attain is:

to read fileA line by line
search entire line as string in fileB
when found, grep the next line in fileB
then merge "searched line" and "found line" in a new file, fileC

Here is my fileA:

T S Eliot
J L Borges
L Aragon
L L Aragon
T S Eliot 4 0 3333
L L Aragon 2
J L Borges 5

FileB:

0290202092090
J F Kennedy America Brookline
+92999929999990
Abraham Lincoln Hodgenville
+2828288889999
L Aragon
+330000000000
Tinto Brass Milano Italy 
+330022110033
J L Borges
+440011223344
T S Eliot

Expected Output, fileC:

T S Eliot
+440011223344
J L Borges
+330022110033
L Aragon
+2828288889999

This is what I tried:

while read line
do
    count=`echo $line | wc -w`
    if [ $count -eq 6 ]; then
grep "$COL1 $COL2 $COL3 $COL4 $COL5 $COL6" -B1 fileB
        if [ $count -eq 5 ]; then
grep "$COL1 $COL2 $COL3 $COL4 $COL5" -B1 fileB
        if [ $count -eq 4 ]; then
grep "$COL1 $COL2 $COL3 $COL4" -B1 fileB
        if [ $count -eq 3 ]; then
grep "$COL1 $COL2 $COL3" -B1 fileB
        if [ $count -eq 2 ]; then
         break;
    fi
done < fileA > fileC

It gives syntax error near unexpected token `done'
PS: I just set count variable limit to 6, not sure how to set to max word nr.

I'd be happy if you could lead me.

Many thanks
Boris

Don_Cragun · September 25, 2018, 6:06am

You have five if s with only one fi . The error message you're getting is saying that the shell is expecting four more fi s before seeing done .

If you understand how to set the count variable, what don't you understand about setting a variable named max or max_word_nr ?

rovf · September 25, 2018, 7:37am

In addition to what Don Cragun said, please make clear which shell you want to use. This can't be infered unambiguously from your code example.

RudiC · September 25, 2018, 9:45am

Are you aware that (with what you posted in post#1) the

grep "$COL1 $COL2 $COL3 $COL4 $COL5 $COL6" -B1 fileB

will "grep" for five consecutive spaces?
And, the order of any two lines needed to be reversed to get to your desired output?

baris35 · September 25, 2018, 9:46am

Hello Don,
Thank you, I am under ubuntu 18.04 bionic.
I edited upon your notification. Now it gives empty result.

 while read line
do
    count=`echo $line | wc -w`
    if [ $count -eq 6 ]; then
grep "$COL1 $COL2 $COL3 $COL4 $COL5 $COL6" -B1 fileB
   else
        if [ $count -eq 5 ]; then
grep "$COL1 $COL2 $COL3 $COL4 $COL5" -B1 fileB
  else
       if [ $count -eq 4 ]; then
grep "$COL1 $COL2 $COL3 $COL4" -B1 fileB
 else       
        if [ $count -eq 3 ]; then
grep "$COL1 $COL2 $COL3" -B1 fileB
      else  
            if [ $count -eq 2 ]; then
        break;
fi
fi
fi
fi
fi
done < fileA > fileC

Many Thanks
Boris

Peasant · September 25, 2018, 11:33am

Well that's just a clutter of GNU greps in broken loops to be sincere.

Assumptions are made that line with number always start with number or a + sign.

Give this a shot :

awk 'FNR==NR { person[$0]; next } /^[0-9]|+/ { number=$0 } $0 in person { printf ("%s\n%s\n", $0, number) } ' fileA fileB > fileC

Try to use awk for such assignments, it will prove irreplaceable and fast.

Also baris35 after 200 posts, a more sane approach to coding and asking questions would be advised.
Looks like you didn't even read the questions from other posters, just added 5 if's without any consideration / thinking or reading about it.
They could have just as well pasted an awk line to solve your problem (probably better then mine:) ), but we are here to try to make you think about a problem yourself.

Hope that helps.
Regards
Peasant.

baris35 · September 25, 2018, 11:48am

Dear Peasant,
Thanks but that code gives empty file at my end. The reason is most likely related to your assumption + sign etc...
Regarding 5x if loop , you are right. I did not like it but I excerpted that code from another thread since it was the most similar one to my case.

Anyway,
Thank You All for your time. I will try to find a different way to sort it out.

Kind regards
Boris

Peasant · September 25, 2018, 12:09pm

Well, a representative input should be given then.
A smallest possible portion of your data which cover your request entirely.

Without that, it is not possible for person here to cover all cases magically.

Regards
Peasant.

baris35 · September 25, 2018, 12:14pm

Thank you Peasant

Regards
Boris

MadeInGermany · September 25, 2018, 1:41pm

Instead of "else if" and nesting deeper the shell has "elif" that remains on the same level.

if ... then ... elif ... then ... elif ... then ... else ... fi

But if each condition tests the same variable then it is more appropriate to use a "case-esac":

set -f
while read line
do
  set -- $line
  case $# in
  6) grep -B1 -w "$1 $2 $3 $4 $5 $6" fileB
  ;;
  5) grep -B1 -w "$1 $2 $3 $4 $5" fileB
  ;;
  4) grep -B1 -w "$1 $2 $3 $4" fileB
  ;;
  3) grep -B1 -w "$1 $2 $3" fileB
  ;;
  2) grep -B1 -w "$1 $2" fileB
  ;;
  *)
  esac
done < fileA

baris35 · September 25, 2018, 2:12pm

Dear MadeInGermany,
Thanks for your answer. It worked at my end with expected result
I have just sorted out by trying another way. As I could not have handle it with previously explained way by other valuable board members, I replaced space by underscore in all my source files. By this way, fileA turned to one colon lines. Then I ran below script. And at last, replaced underscore by space once again to turn output file to expected view:

#!/bin/bash
while read COL1
do
        count=`echo $COL1 | wc -w`
        if [ $count -eq 2 ]; then
                break;
        else
                grep $COL1 -B1 fileB
fi
done<fileA > fileC
sed -i 's/_/ /g' fileC

Many Thanks
Boris

MadeInGermany · September 25, 2018, 3:12pm

But how can $count be 2 then? Isn't it always 1?
And, if you put the $COL1 in quotes, you can again work with the original space-separated files, and it boils down to

while read line
do
  grep -B1 -- "$line" fileB
done <fileA

baris35 · September 25, 2018, 3:31pm

You are right. I took into account another case but now I see that it is not possible.
I will be using the code which you posted.

Many thanks
Boris

RudiC · September 25, 2018, 4:23pm

How far would the -f (file) option to grep get you?

grep -B1 -ffile1 file2
+2828288889999
L Aragon
--
+330022110033
J L Borges
+440011223344
T S Eliot

baris35 · September 25, 2018, 4:32pm

Dear Rudic,
Many thanks but it gives in reverse order. Maybe there could be another option to list in reverse with one liner.

Expected order:

Eliot
Borges
Aragon

Your output:

Aragon
Borges
Eliot

Thanks for You All

Kind regards
Boris

RudiC · September 25, 2018, 5:03pm

It's the order of your file2. Howsoever, try

grep -B1 -ffile1 file2 | tac
T S Eliot
+440011223344
J L Borges
+330022110033
--
L Aragon
+2828288889999

baris35 · September 25, 2018, 5:24pm

Thanks Rudic,
tac and cat gives different order. Interesting command

Many thanks
Boris

rovf · September 26, 2018, 2:44am

Getting no output just means that your pattern doesn't match the content of any line in fileB, or that count is greater than 6. You did not write how you set the variables COL1, COL2 etc. The easiest way to do this is to turn on tracing using

set -x

. My guess is that fileB simply doesn't have any line matching the pattern.

Another point - which is just a matter of style: You don't need deeply nested if here. instead of

if CONDITION1; then
  foo
else
  if CONDITION2; then
    bar
  fi
fi

it is much more readable to write

if CONDITION1; then
  foo
elif CONDITION2; then
  bar
fi