Help with print out line that have different record in specific column

perl_beginner · February 23, 2014, 10:40pm

Input file 1:

-       7367    8198
-       8225    9383
+       9570    10353

Input file 2:

-       2917    3667
-       3851    4250
+       4517    6302
+       6302    6740
+       6768    7524
+       7648    8170
+       8272    8896
+       8908    9915
-       10010   10796
-       10788   11514
+       11588   12533
+       12545   13874

Input file 3:

+       56      1190
+       1199    2606
-       2698    3337

Desired Output file 1:

-       8225    9383
+       9570    10353

Desired Output file 2:

-       3851    4250
+       4517    6302
+       8908    9915
-       10010   10796
-       10788   11514
+       11588   12533

Desired Output file 3:

+       1199    2606
-       2698    3337

I would like only print out the line that start change from "+" to "-" or "-" to "+".
Situation in Input file 2, is a bit more challenge.
I have no much idea about how to solve it out
Thanks for any advice.

Don_Cragun · February 23, 2014, 11:10pm

What output would you expect from the input:

-       1       2
+       3       4
-       5       6
+       7       8

perl_beginner · February 23, 2014, 11:19pm

Hi Don Cragun,

Thanks for your reply.
If the input is shown as :

-       1       2
+       3       4
-       5       6
+       7       8

It should return exactly as :

-       1       2
+       3       4
-       5       6
+       7       8

I did try the following command :

awk '_[$1]++==0{print}' input_file

But it don't really return whatever I want
I a bit confusing regarding write correct code to fix the problem.

Thanks for any advice.

Don_Cragun · February 24, 2014, 12:04am

It looks like the script you were trying would just print the 1st line in each file for each different value in the 1st field. The following awk program seems to do what you want:

awk '
$1 == f1 || NR == 1 {
        f0 = $0 "\n"
        f1 = $1
        next
}
{       print f0 $0
        f0 = ""
        f1 = $1
}' input_file

If you want to try this on a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of the default /usr/bin/awk .

ahamed101 · February 24, 2014, 12:05am

Try this

 awk 'p && $1 != p{print v; print}{p=$1;v=$0}' infile

--ahamed

Don_Cragun · February 24, 2014, 12:11am

When given the input shown in message #3 in this thread, the above script produces:

-       1       2
+       3       4
+       3       4
-       5       6
-       5       6
+       7       8

instead of:

-       1       2
+       3       4
-       5       6
+       7       8

ahamed101 · February 24, 2014, 12:16am

Thanks Don. I tested only with the OP data.

Updated

awk 'p && $1 != p{print v; print;p=v="";next}{p=$1;v=$0}'  infile

On a second thought, technically speaking the OP wanted the output whenever the sign changed.
In a parallel universe, that output would be still valid

--ahamed

perl_beginner · February 24, 2014, 4:16am

Hi Don Cragun,

I try to run your code :

awk ' $1 == f1 || NR == 1 { f0 = $0 "\n" f1 = $1 next } { print f0 $0 f0 = "" f1 = $1 }' Input_file

awk:  $1 == f1 || NR == 1 { f0 = $0 "\n" f1 = $1 next } { print f0 $0 f0 = "" f1 = $1 }
awk:                                        ^ syntax error
awk:  $1 == f1 || NR == 1 { f0 = $0 "\n" f1 = $1 next } { print f0 $0 f0 = "" f1 = $1 }
awk:                                                                     ^ syntax error

Is it because I don't have /usr/xpg4/bin/awk install ?
I seems like run using /usr/bin/awk

Thanks for any advice.

anbu23 · February 24, 2014, 4:19am

Syntax error is due to missing semicolon. Add semicolon to separate multiple statements within a line

awk ' $1 == f1 || NR == 1 { f0 = $0 "\n" ; f1 = $1 ; next } { print f0 $0 ; f0 = "" ; f1 = $1 }' Input_file

Don_Cragun · February 24, 2014, 4:41am

perl_beginner:

Hi Don Cragun,

I try to run your code :

awk ' $1 == f1 || NR == 1 { f0 = $0 "\n" f1 = $1 next } { print f0 $0 f0 = "" f1 = $1 }' Input_file

awk:  $1 == f1 || NR == 1 { f0 = $0 "\n" f1 = $1 next } { print f0 $0 f0 = "" f1 = $1 }
awk:                                        ^ syntax error
awk:  $1 == f1 || NR == 1 { f0 = $0 "\n" f1 = $1 next } { print f0 $0 f0 = "" f1 = $1 }
awk:                                                                     ^ syntax error

Is it because I don't have /usr/xpg4/bin/awk install ?
I seems like run using /usr/bin/awk

Thanks for any advice.

Do you see any difference between my code:

awk '
$1 == f1 || NR == 1 {
        f0 = $0 "\n"
        f1 = $1
        next
}
{       print f0 $0
        f0 = ""
        f1 = $1
}' input_file

and what you have above? Please try running my code before saying my code doesn't work. Removing newlines from the middle of an awk program might not change anything and might completely change the meaning. Most of the newlines you removed changed the behavior of my suggested code. You can turn my awk code into a 1-line script as suggested by anbu23; I, however, prefer readable code where I can see the structure of the code by the indentation.

What operating system are you using? I said:

If you don't have /usr/xpg4/bin/awk , I assume you are not using a Solaris system running the SunOS operating system.

perl_beginner · February 24, 2014, 8:14am

Hi anbu23,

Thanks for point out my mistakes of Don Cragun's code.

---------- Post updated at 08:14 AM ---------- Previous update was at 07:57 AM ----------

Hi Don Cragun,

Sorry for misunderstanding and misuse your awk program.
I'm using x86_64 GNU/Linux system.

I'm currently trying to understand the logic of your awk program.
Really thanks for your advice and knowledge sharing.

Apologize for my mistake.
Thanks again

Don_Cragun · February 24, 2014, 11:56am

This is a fairly simple awk script. Here it is again with comments explaining what each line does:

awk ' 
$1 == f1 || NR == 1 {   # If the 1st field is the same as the 1st field on the
                        # previous line, or if this is the 1st line in the file:
        f0 = $0 "\n"    #       Save the current line with a newline for output
                        #       when we see a different value in the 1st field.
                        #       ($0 is the current line without the terminating
                        #       newlien character.)
        f1 = $1         #       Save the current 1st field to compare to the 1st
                        #       field on the next line.
        next            #       Skip to next input line.
}
{                       # To get to here, the 1st field on this line is not the
                        # same as the 1st field on the previous line:
        print f0 $0     #       Print the saved line and the current line.  The
                        #       print command adds a newline at the end of
                        #       whatever it prints.
        f0 = ""         #       Clear the saved line.  (We do not want to print
                        #       the current line twice if the 1st field in the
                        #       next line does not match the 1st field in this
                        #       line.)
        f1 = $1         #       Save the current 1st field to compare to the 1st
                        #       field on the next line.
}' input_file           # The input for this script is a file named "input_file".

perl_beginner · February 24, 2014, 10:42pm

Dear Don Cragun,

Really thanks and appreciate your input.
Thanks for spending your time to guide me and explain your awk program in detail.

Super thanks you.

---------- Post updated at 10:42 PM ---------- Previous update was at 09:54 PM ----------

Dear Don Cragun,

After run your awk program, I got one similiar question need your advice :
Input file 1:

+ 123897 125349 
- 125727 125836 
- 127179 128103 
+ 128356 128848 
- 128476 130282 
- 135728 136490
+ 136845 138219 
- 138318 139845

Output file 1:

+ 123897 125349 
- 125727 125836 
- 127179 128103 
+ 128356 128848 
+ 128356 128848 
- 128476 130282 
- 135728 136490
+ 136845 138219 
+ 136845 138219 
- 138318 139845

Input file 2:

- 127179 128103 
+ 128356 128848 
- 128476 130282

Output file 2:

- 127179 128103 
+ 128356 128848 
+ 128356 128848 
- 128476 130282

Input file 3:

+ 127179 128103 
- 128356 128848 
+ 128476 130282 
- 135728 136490

Output file 3:

+ 127179 128103 
- 128356 128848 
- 128356 128848 
+ 128476 130282 
+ 128476 130282 
- 135728 136490

Basically I hope that able to duplicate the record if it shown as "- + -" or "+ - +" becomes "- + + -" or "+ - - +".

Kindly let me know if you not too sure about what I ask.
Thanks again.

ahamed101 · February 24, 2014, 10:53pm

Yes, a bit confused. Sample input/output?

--ahamed

perl_beginner · February 24, 2014, 10:58pm

Hi anbu23,

I got one more question might need your advice in thread#23.

Kindly let me know if you got any idea about it.
Thanks.

---------- Post updated at 10:58 PM ---------- Previous update was at 10:54 PM ----------

Hi ahamed101,

Below just few case of different input :
Input file 1

-       45472   46630
+       46817   47600
-       48573   49767

Desired Output file 1

-       45472   46630
+       46817   47600
+       46817   47600
-       48573   49767

Input file 2

+       81446   82784
-       82843   83058
+       89725   90700

Desired Output file 2

+       81446   82784
-       82843   83058
-       82843   83058
+       89725   90700

Basically I just wanna duplicate previous record if it shown as "- + -" or "+ - +" becomes "- + + -" or "+ - - +"

Sorry for confusing you

ahamed101 · February 24, 2014, 11:04pm

Try this...

awk 'p && $1 != p{print v; print}{p=$1;v=$0}'  infile

--ahamed

Don_Cragun · February 24, 2014, 11:14pm

From the comments in my code in message #12 in this thread, can you figure out which line needs to change so an input line will be printed twice when the 1st field changes on three consecutive lines? If you can figure out which line needs to change, can you figure out how to change it?

perl_beginner · February 24, 2014, 11:41pm

Hi Don Cragun,

Is it I should change the line :

print f0 $0    #       Print the saved line and the current line.  The
                  #       print command adds a newline at the end of
                  #       whatever it prints.
       f0 = ""   #       Clear the saved line.  (We do not want to print
                   #       the current line twice if the 1st field in the
                    #       next line does not match the 1st field in this
                    #       line.)

Don_Cragun · February 25, 2014, 12:29am

Yes, one of those lines is it. If you change:

        f0 = ""         #       Clear the saved line.  (We do not want to print
                        #       the current line twice if the 1st field in the
                        #       next line does not match the 1st field in this
                        #       line.)

to:

        f0 = $0 "\n"    #       Save the current line with a newline for output
                        #       when we see a different value in the 1st field.
                        #       ($0 is the current line without the terminating
                        #       newline character.)

it should do what you want.

If you re-read this thread, you might notice that the script ahamed101 provided in message #5 in this thread is a condensed version of this script. My comment in message $6 in this thread pointed out that that script did what you are now requesting instead of doing what you originally requested.