Extract numbers below words with awk

Hi all,

Please some help over here. I have a Sales.txt file containing info in blocks for
every sold product in the pattern showed below (only for 2 products).

 
NEW BLOCK
SALE DATA 
PRODUCT           SERIAL             
79833269999      146701011945004  
.Some other data 
.Some other data 
.Some other data 
SALE SERIAL                     SALE NUMBER
7-4324990101                    4324990101         
STORE NUMBER
7-43212     
END OF BLOCK
 
NEW BLOCK
SALE DATA
PRODUCT           SERIAL             
79806743123      1598670108411112  
.Some other data 
.Some other data 
.Some other data 
SALE SERIAL                     SALE NUMBER
7-4324995448                    4324995448        
STORE NUMBER
7-43213     
END OF BLOCK

I�ve used awk for basic things, but this is beyond my knowledge. I�d like to extract only the number below the word "PRODUCT" and its respective number below "SALE SERIAL" and store the processed data in a new file.

For the data blocks above the desired result would be.

79833269999 7-4324990101
79806743123 7-4324995448

Somebody could help me with this issue please. Grateful for any help or suggestion.

Best regards

awk '/PRODUCT/ {getline ; print}' sales.txt

Many thanks jpradley for your answer,

I�ve tryied adding the word SALE SERIAL to get the number below this word either.

awk '/PRODUCT/||/SALE SERIAL/ {getline ; print}' sales.txt > OutSales.txt

But the result is:

79833269999 146701011945004
7-4324990101 4324990101
79806743123 1598670108411112
7-4324995448 4324995448

How can I get only the number in first column below every word to see the info as follow: (I mean, leaving the numbers in red in the same line and erasing the other ones)

79833269999 7-4324990101 -->numbers extracted from block 1
79806743123 7-4324995448 -->numbers extracted from block 2

Thanks again.

Best regards.

awk '
/PRODUCT/ {getline; print}
/SALE SERIAL/ {getline; print $1}
' sales.txt

Now I have the wanted result, many thanks jpradley for your help and time, with your tips and after several tests I�ve found a code that works for my request. There are 3 different commands that I put in a .sh file, but I�m not sure how to join them in one command in order to write "awk" only once.

 
awk '/^PRODUCT/ {getline;
 print $1
}
/^SALE SERIAL/ {getline;
 print $1
}' Sales.txt > Sales_Temp.txt
awk 'ORS = NR%2 ? " " : "\n"
' Sales_Temp.txt > Sales_Filtered.txt
 
rm Sales_Temp.txt

Really thanks again.

#! /usr/bin/perl
open FH,"<a.txt";
while(<FH>){
	if (/PRODUCT/ || /SALE SERIAL/){
		$flag=1;
		next;
	}
	if ($flag==1){
		$flag=0;
		@tmp=split(/( |	)+/,$_);
		print $tmp[0]," ";
	}
	print "\n" if /END OF BLOCK/;
}
close FH;

Hi summer_cherry, thanks for contribution to my question.

I�ve tryed with your code replacing the source file name like this

open FH,"<Sales.txt"; 

instead of

 
open FH,"<a.txt";

But doesn�t work.

The source file is "Sales.txt" and the destination file I�d like "Sales_Filtered.txt".

How edit the file names within the code?

Thanks in advance.

Try this:

awk '/^PRODUCT/ {getline; printf("%s%s", $1, NR%2? FS:RS}
/^SALE SERIAL/ {getline; printf("%s%s", $1, NR%2? FS:RS}
' Sales.txt > Sales_Filtered.txt

Regards

Hi Franklin52,

Many thanks for your answer.

I,ve tryed your suggestion

 
awk '/^PRODUCT/ {getline; printf("%s%s", $1, NR%2? FS:RS)} 
/^SALE SERIAL/ {getline; printf("%s%s", $1, NR%2? FS:RS)}
' Sales.txt

and a little bit shorter variant

awk '/^PRODUCT/||/^SALE SERIAL/ {getline; printf("%s%s", $1, NR%2? FS:RS)}' Sales.txt

but the result is the same, as follow:

79833269999
7-4324990101 79806743123 7-4324995448

and not like I�d like, as follow:

79833269999 7-4324990101
79806743123 7-4324995448

What it�s missing?

I�m beginning to understand awk�s logic, thanks for your help.

Best regards

Sorry, I looked over something, try this:

awk '/^PRODUCT/ {getline; printf("%s ", $1)} 
/^SALE SERIAL/ {getline; printf("%s\n", $1)}
' Sales.txt

Regards

Hi, I'm newby to scripting. Thanks in advance for your input.. I have a flat file like below:
"Mon Jan 7 11:32:25 EST 2008",
291
"Mon Jan 7 12:01:00 EST 2008",
291
"Mon Jan 7 18:01:00 EST 2008",
318
"Tue Jan 8 00:01:00 EST 2008",
357
"Tue Jan 8 06:01:00 EST 2008",
352
"Tue Jan 8 12:01:00 EST 2008",
388
"Tue Jan 8 18:01:00 EST 2008",
387
"Wed Jan 9 00:01:00 EST 2008",
387
"Wed Jan 9 06:01:00 EST 2008",
406
"Wed Jan 9 12:01:00 EST 2008",
403
"Wed Jan 9 18:01:00 EST 2008",
406
"Thu Jan 10 00:01:00 EST 2008",
414
"Thu Jan 10 06:01:00 EST 2008",

I need to merge line 1 and 2, then 3 and 4, to make it become:
"Mon Jan 7 11:32:25 EST 2008", 291
"Mon Jan 7 12:01:00 EST 2008",291
"Mon Jan 7 18:01:00 EST 2008",318
"Tue Jan 8 00:01:00 EST 2008",357
"Tue Jan 8 06:01:00 EST 2008",352
"Tue Jan 8 12:01:00 EST 2008",388
"Tue Jan 8 18:01:00 EST 2008",387
"Wed Jan 9 00:01:00 EST 2008",387
"Wed Jan 9 06:01:00 EST 2008",406
"Wed Jan 9 12:01:00 EST 2008",403
"Wed Jan 9 18:01:00 EST 2008",406
"Thu Jan 10 00:01:00 EST 2008",414
"Thu Jan 10 06:01:00 EST 2008",.......

Any quick way to do it?? Thanks!

Perfect man!!!

I understand almost all code:

getline= Get data from line below
$1=Get data from column 1

but only one more thing, what it�s means in your code

"%s "=
"%s\n"=

I dont get the meaning.

Thanks in advance

"%s " = print a space after the line
"%s\n"= print a newline after the line

Regards

Many thanks Franklin52, I�ve learned several things through this question.
Thanks for your contribution and take some of your time to share what you know.

Hi flyingfish,

Part of my problem was exactly you�re asking.

I got a sample code over there that does what you want; I mean, put odd lines in column 1
and even lines in column 2. I�m not completely sure how it works, but it works :smiley:

Try with:
(To print in display the processed data)

  awk 'ORS = NR%2 ? " " : "\n"' Source.txt

or

(To send the processed data from Source.txt to the file Destination.txt)

awk 'ORS = NR%2 ? " " : "\n"' Source.txt > Destination.txt

I hope it helps,

Really thanks to all,

Best regards

flyingfish,

First, don't hijack another ones thread but start your own thread.
With the solution of cgkmal you get a space between the first and second line, to avoid this you can use this:

awk '{printf("%s%s",$0,NR%2?"":ORS)}' file
awk 'ORS = NR%2 ? " " : "\n"' Source.txt

Here we use a conditional operator, the syntax is: expr ? action1 : action2

ORS = assign the value to the recordseparator ORS

NR%2 ? " " : "\n"

Here we look for the remainder of the division of the linenumber by 2 with the modulo operator %, odd lines give a 1 (true) and even lines give a 0 (false).
If the operator gives true (odd lines) the ORS is a space, otherwise the ORS is a newline.

Hope this helps.

Regards

Worked as magic:

Code:
---------
awk 'ORS = NR%2 ? " " : "\n"' Source.txt > Destination.txt
---------:b:

Nice know it worked for you flyingfish,

Hi Franklin52,

Know that your explanation helps A LOT!!. I undersand much better the logic
used to solve the conditional part.

Many thanks for your valauble assistance.

Best regards :b:

maybe you need to replace below with your path, and put script in the same folder with your file Sales.txt

#! /usr/bin/perl