Awk/sed/cut to filter out records from a file based on criteria

I have two files and would need to filter out records based on certain criteria, these column are of variable lengths, but the lengths are uniform throughout all the records of the file. I have shown a sample of three records below. Line 1-9 is the item number "0227546_1" in the case of the first record, the department number is line 361-369, which would be 3101_1_1 in the first record. I would want to compare two files for item numbers (keep in mind they could be at different records in these two files) and if they have a different department number I would need the entire record to be filtered out in a third file.

I have also included a sample of the second file and expected output as there was a change in dept #, it only shows two records.

Any help will be greatly appreciated, I have been struggling with this for a while.

First file:

0227546_1           0227546,OIL FILTER      PH3675     HONEYWELL                                                        
0227546A            0227546,OIL FILTER      PH3675     HONEYWELL                
0227546             0227546,OIL FILTER      PH3675     HONEYWELL                
0                   0                                                           
3101_1_1            Under Car                                                   
3101_1              Parts and Repair                                            
3101                Auto Retail                                                 
301                 Automotive                                                  
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0227546A            0227546,OIL FILTER      PH3675     HONEYWELL                
0227546A            0227546,OIL FILTER      PH3675     HONEYWELL                
0                   0                                                           
3101_1_1            Under Car                                                   
3101_1              Parts and Repair                                            
3101                Auto Retail                                                 
301                 Automotive                                                  
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
0                   0                                                                                                   


0249068_1           0249068,Fresco Castiglione Chianti                                                                  
0249068A            0249068,Fresco Castiglione Chianti                          
0249068             0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0249068A            0249068,Fresco Castiglione Chianti                          
0249068A            0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
0                   0                                                                                                   


0274321_1           0274321,SIMPLY PERFECT SILVER  KING                                                                 
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0274321             0274321,SIMPLY PERFECT SILVER  KING                         
0                   0                                                           
1478_3_2            Down Pillows                                                
1478_3              Down Products                                               
1478                White Goods                                                 
478                 Domestics                                                   
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0                   0                                                           
1478_3_2            Down Pillows                                                
1478_3              Down Products                                               
1478                White Goods                                                 
478                 Domestics                                                   
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
0                   0           

Second File:

0249068_1           0249068,Fresco Castiglione Chianti                                                                  
0249068A            0249068,Fresco Castiglione Chianti                          
0249068             0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0249068A            0249068,Fresco Castiglione Chianti                          
0249068A            0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
0                   0                                                                                                   


0274321_1           0274321,SIMPLY PERFECT SILVER  KING                                                                 
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0274321             0274321,SIMPLY PERFECT SILVER  KING                         
0                   0                                                           
1478_5_1            Down Pillows                                                
1478_3              Down Products                                               
1478                White Goods                                                 
478                 Domestics                                                   
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0                   0                                                           
1478_3_2            Down Pillows                                                
1478_3              Down Products                                               
1478                White Goods                                                 
478                 Domestics                                                   
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
0                   0                                                                                                   


0308148_1           0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE                                                     
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0308148             0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                  
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
0                   0

Third file (Output):

0249068_1           0249068,Fresco Castiglione Chianti                                                                  
0249068A            0249068,Fresco Castiglione Chianti                          
0249068             0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0249068A            0249068,Fresco Castiglione Chianti                          
0249068A            0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
0                   0                                                                                                   


0308148_1           0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE                                                     
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0308148             0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                  
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
0                   0

a bit of a longhand - making some assumptions based on the file samples.....

awk -f mia.awk file1.txt file2.txt

where mia.awk is:

FNR==NR {
  if (split($1,a, "_") ==3)
     f1[$1]
  next
}
NF{
  if (split($1, a, "_") ==3 && $1 in f1)
     p=1
  rec=(!rec)?$0:rec ORS $0
  next
}
p{
    print rec ORS
    p=0; rec=""
}
1 Like

Thanks @vgersh99

Tried to do so but got the following error:

mia.awk: {:  not found
mia.awk[2]: syntax error at line 2 : `(' unexpected

Can you please break down to me what is happening in the script?

make sure you're copy/pasting the entire script without missing anything.
Works here just fine...

Thanks again. I think there is a misunderstanding, each "paragraph" in the file represents a record (1 line). So item is columns 1-9 and department is columns 361-369. Of course, I tried to copy-paste but it just turned it into a paragraph. For readability purposes, I sperated the records by a space.

Well... then provide A REAL sample file withOUT any editing...

Sent from my iPhone using Tapatalk

It would be impossible to do so as there is a limitation on the width when I try to post it. Any chance you can please explain the steps so I can try to edit it to suit my needs?

put in a file and attach it to the thread.
The current implementation is based on you "mock-up" and might not be relevant at all to the issue at hand.

1 Like

I have the two files zipped. When you unzip, you will see prod.dat and prod_last.dat, they are different. In records 1-3, have different department numbers (columns 361-369), all those 3 records with matching item numbers (columns 1-9) in the other file would need to be filtered out in the output. Thanks again.

not sure of the desired output... try switching the order of the prod.dat and prod_last.dat files on a cli.

awk '
{s=substr($0,361,9)}
FNR==NR {
  f1=$1
  next
}
NF && s in f1 && f1==$1' prod.dat  prod_last.dat

Sorry for bothering you again, but this returned 176 records out of 1885, the desired output would be 1882 records, as only 3 items have different department IDs between the two files.

there you go:

awk '
{dep=substr($0,361,9)}
FNR==NR {
  f1[$1]=dep
  next
}
NF && $1 in f1 && f1[$1]==dep' prod.dat  prod_last.dat
1 Like

Now, when I replace the item number (col 1-9), or replace the order of the items, those records were filtered out as well. They won't always be in the same order, or new items might come in. I just need those with different dept numbers (col 361-369) to be filtered out. Anyway that can be done?

the order of the line is ANY of the 2 files doesn't matter.
I'm keying on item num (1-9) with value of dep (361-369).

If item num (1-9) exists in file1 and file2, but have different dep value (361-369) in file2, this record/line will be filtered out.

That's the basics of the script. What am I missing?
Maybe providing a small enough (manageable) snippets of both files covering all expected scenarios and the expected out could help....

Ok I have attached another zip file, both files have 6 records, and the desired output (prod_new.dat) has 4 records. In prod.dat there are 2 records with item numbers that do not exist in prod_last.dat, and that is fine, I need that included in the output. But there are also 2 records with matching item #s but different department numbers, I would need that filtered out.

When I ran your code it filtered all 4 records and included only 2.

modified code and the order of the files matters.

awk '
{dep=substr($0,361,9)}
FNR==NR {
  f1[$1]=dep
  next
}
NF && (!($1 in f1) || f1[$1]==dep)' prod_last.dat prod.dat
1 Like