To extract certain columnns with header

Hi,
I have data which has many columns but i need to extract only three column from below all records have one space in begining.

i need to extract only column as below ,any tric awk or sed trick?, please advise.

cloumn no : 3 : ORD NO
coulmn no : 6: P A R T  N U M B E R
coulmn no : 9 : INV NO / SER NO
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 424009024545 PEF0AM1MX2MX40MM         PLAIN PE              490117701 110024 0112 8916            280819 58      20.0        850.00
 424009020858 3M7447                   SCOTCHBR              490118901 110025 0112 SM0883          280799 29       9.0       1181.25
 424009024756 SAFETYC0NE30IN           PVC TRAF              490126001 110026 0112 1412008         280477 01      10.0        450.00
 424009022922 C0TT0NRAG                COTTON R              490105304 110027 0112 13264           280696 48     100.0       4200.00

How about this very simple approach for exactly the problem you posted:

awk '
NR == 1         {match ($0, HD1)
                 P1 = RSTART
                 L1 = RLENGTH
                 match ($0, HD2)
                 P2 = RSTART
                 L2 = RLENGTH
                 match ($0, HD3)
                 P3 = RSTART
                 L3 = RLENGTH
                }
                {print substr ($0, P1, L1), substr ($0, P2, L2), substr ($0, P3, L3)
                }
' HD1=" ORD NO *" HD2=" P A R T  N U M B E R *" HD3="INV NO / SER NO *" file
 ORD NO     P A R T  N U M B E R     INV NO / SER NO 
---------  ------------------------  --------------- 
490117701  PEF0AM1MX2MX40MM          8916            
490118901  3M7447                    SM0883          
490126001  SAFETYC0NE30IN            1412008         
490105304  C0TT0NRAG                 13264           
3 Likes

Thank you Rudie , have partly submitted my query earlier in a hurry my actual data content as below.

1
     LIST 950C/M                         Jeffron aliff system
     MASTERS                                       ISSUE    REPORT (MONTHLY)               01JAN15      PAGE      1
0    MOVEMENT CODE  26     
0    STOCK A/C NO   1400100
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 424009024545 PEF0AM1MX2MX40MM         PLAIN PE              490117701 110024 0112 8916            280819 58      20.0        850.00
 424009020858 3M7447                   SCOTCHBR              490118901 110025 0112 SM0883          280799 29       9.0       1181.25
 424009024756 SAFETYC0NE30IN           PVC TRAF              490126001 110026 0112 1412008         280477 01      10.0        450.00
 424009022922 C0TT0NRAG                COTTON R              490105304 110027 0112 13264           280696 48     100.0       4200.00
 436003010191 ARINUS0940               SANITISI              490121901 120046 0112 D01100477       280708 52       4.0        200.00
 440101001488 8W550C3                  PRIMER S              421751301 397174 0212 1557            S22112 39       1.0        459.59
 1
     LIST 950C/M                         Jeffron aliff system
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      2
0    MOVEMENT CODE  26     
0    STOCK A/C NO   1400100
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 440101000437 BR127                    PRIMER                421755201 397185 0512 3027490         D24170 15       2.0       1410.72
 440101000578 SIKKENSN0NSLIPGREY       SIKKENS               421752701 361076 0512 15668           S22112 15       5.0       1508.94
 424009025224 WHEELCAST0R8X2SWIVEL     WHEEL CA              490127301 110047 0512 111931          290040 01      20.0        960.00
 424009025225 SELFTAPINGSCREW112IN     SELF TAP              490127302 110048 0512 111931          290040 52       1.0         26.00
 424009020343 BRUSHBAMB002INSX6STICK   BAMBOO S              490126701 110049 0512 111896          290040 01      20.0         80.00
1
     LIST 950C/M                         Jeffron aliff system
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      3
0    MOVEMENT CODE  26     STORING OF ORDERED MATERIAL
0    STOCK A/C NO   1400100
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 440006000061 EP0CAST1619AB            ADHESIVE              421766901 397205 1212 3029548         D24170 39       1.0        344.28
 440002000198 PR1422B2                 SEALANT               421766902 397206 1212 3029548         D24170 39       2.0        209.92
 440001000246 DUNL0PLP                 DUNLOP L              421750101 381910 1212 D01341MY1       280911 15      50.0       2350.00
 440306000045 SKDS2                    MAGNAFLU              421761201 381909 1212 18036089        290065 15     204.0       6120.00
 440202000015 ASG223KIL0               AEROSHEL              421756401 334607 1212 30854           D24170 15      33.0      10391.53
 440002000044 PS870A2                  PS870A2               421724801 397208 1212 QA213552014     S23196 39      48.0       5962.92

and my requirement is to pick header named as mentioned earlier [3 column] and all records by skipping below content.

1
     LIST 950C/M                         Jeffron aliff system
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      2
0    MOVEMENT CODE  26     
0    STOCK A/C NO   1400100

three column header only once and from the next line all records of the file
all records which i need to pick having one space in beginning the rest.

---------- Post updated at 07:24 AM ---------- Previous update was at 06:20 AM ----------

So output should be as below

0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 424009024545 PEF0AM1MX2MX40MM         PLAIN PE              490117701 110024 0112 8916            280819 58      20.0        850.00
 424009020858 3M7447                   SCOTCHBR              490118901 110025 0112 SM0883          280799 29       9.0       1181.25
 424009024756 SAFETYC0NE30IN           PVC TRAF              490126001 110026 0112 1412008         280477 01      10.0        450.00
 424009022922 C0TT0NRAG                COTTON R              490105304 110027 0112 13264           280696 48     100.0       4200.00
 436003010191 ARINUS0940               SANITISI              490121901 120046 0112 D01100477       280708 52       4.0        200.00
 440101001488 8W550C3                  PRIMER S              421751301 397174 0212 1557            S22112 39       1.0        459.59
 440101000437 BR127                    PRIMER                421755201 397185 0512 3027490         D24170 15       2.0       1410.72
 440101000578 SIKKENSN0NSLIPGREY       SIKKENS               421752701 361076 0512 15668           S22112 15       5.0       1508.94
 424009025224 WHEELCAST0R8X2SWIVEL     WHEEL CA              490127301 110047 0512 111931          290040 01      20.0        960.00
 424009025225 SELFTAPINGSCREW112IN     SELF TAP              490127302 110048 0512 111931          290040 52       1.0         26.00
 424009020343 BRUSHBAMB002INSX6STICK   BAMBOO S              490126701 110049 0512 111896          290040 01      20.0         80.00
 440006000061 EP0CAST1619AB            ADHESIVE              421766901 397205 1212 3029548         D24170 39       1.0        344.28
 440002000198 PR1422B2                 SEALANT               421766902 397206 1212 3029548         D24170 39       2.0        209.92
 440001000246 DUNL0PLP                 DUNLOP L              421750101 381910 1212 D01341MY1       280911 15      50.0       2350.00
 440306000045 SKDS2                    MAGNAFLU              421761201 381909 1212 18036089        290065 15     204.0       6120.00
 440202000015 ASG223KIL0               AEROSHEL              421756401 334607 1212 30854           D24170 15      33.0      10391.53
 440002000044 PS870A2                  PS870A2               421724801 397208 1212 QA213552014     S23196 39      48.0       5962.92

Any attempts / ideas / thoughts from your side to adapt the simple solution proposed?

And, I thought you wanted three columns?

Hi Rudie,
Yes i want those three columns only but i wanted to forgot the paste the full content of the data earlier my way is pick only those lines who has only 1 space in begining rest lines by sed so rest of the records of will be ignored and then apply awk, your assistance is much appreciated.

Thanks.

The data seem to be from an (old?) FORTRAN print file with carriage control characters as the first character in a line being interpreted / suppressed at printout. In principle, they should be eliminated in processing as well.
Picking lines by your criterion "single space" would not suppress the dash lines, so I use the 1 to synchronize to the page breaks and then suppress the following 5 resp. 7 lines. You can select any number of columns by adding their headers to the COLUMNS variable, separated by commas. Try

awk '
/^1/            {SKIP = NR + 5 + HDFND
                }
NR < SKIP       {next
                }
!HDFND          {MX = split (COLUMNS, HD, ",")
                 for (i=1; i<=MX; i++)  {match ($0, HD " *")
                                         P = RSTART
                                         L = RLENGTH
                                        }
                 HDFND = 2
                }
                {for (i=1; i<=MX; i++)  printf "%s ", substr ($0, P, L)
                 printf RS
                }
' COLUMNS=" ORD NO, P A R T  N U M B E R,INV NO / SER NO" file
 ORD NO     P A R T  N U M B E R     INV NO / SER NO  
---------  ------------------------  ---------------  
490117701  PEF0AM1MX2MX40MM          8916             
490118901  3M7447                    SM0883           
490126001  SAFETYC0NE30IN            1412008          
490105304  C0TT0NRAG                 13264            
490121901  ARINUS0940                D01100477        
421751301  8W550C3                   1557             
421755201  BR127                     3027490          
421752701  SIKKENSN0NSLIPGREY        15668            
490127301  WHEELCAST0R8X2SWIVEL      111931           
490127302  SELFTAPINGSCREW112IN      111931           
490126701  BRUSHBAMB002INSX6STICK    111896           
421766901  EP0CAST1619AB             3029548          
421766902  PR1422B2                  3029548          
421750101  DUNL0PLP                  D01341MY1        
421761201  SKDS2                     18036089         
421756401  ASG223KIL0                30854            
421724801  PS870A2                   QA213552014      

Hi Rudie,
Thank you very much it is working as expected , got some additional requirement on top of it.

  1. records should be picked only for the movement code 26 as mentioned in top of every record header ,can be hard coded.
    LIST 950C/M                         JEFFile SYS T E M
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      2
0    MOVEMENT CODE  26     STORING OF ORDERED MATERIAL
0    STOCK A/C NO   1400100
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
  1. addition column only 2 digits needs to display from DOC NO so as per below record 39,36,11,11
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 440101000437 BR127                    PRIMER                421755201 397185 0512 3027490         D24170 15       2.0       1410.72
 440101000578 SIKKENSN0NSLIPGREY       SIKKENS               421752701 361076 0512 15668           S22112 15       5.0       1508.94
 424009025224 WHEELCAST0R8X2SWIVEL     WHEEL CA              490127301 110047 0512 111931          290040 01      20.0        960.00
 424009025225 SELFTAPINGSCREW112IN     SELF TAP              490127302 110048 0512 111931          290040 52       1.0         26.00
  1. To display DDMM colmmn also but with year in file records only date and month i.e 05 date and month 12 but year also needs be appended.
    year logic is to read from the file name.
    file name = R950CMA_01JAN15 so data is for the past month means dec14 so year is 14 in a same way go on..

Final output is expected as below.

ORD NO				P A R T  N U M B E R 		DDMM		INV NO / SER NO		DOC NO
421755201			BR127					051214		3027490			39
421752701			SIKKENSN0NSLIPGREY 		051214		15668				36
490127301			WHEELCAST0R8X2SWIVEL		051214		111931				11
490127302			SELFTAPINGSCREW112IN  		051214		111931				11

---------- Post updated at 06:57 AM ---------- Previous update was at 01:55 AM ----------

HI Rudie,
Any luck on above?

I'd propose you try your luck and post back where you're stuck.

1 Like