AWK: Grep Pattern and print help

rtsiahaan · February 27, 2012, 11:47am

I wanted to get outcome from a big file with pattern quoted:
Line FSP LSP SR RL
Test1 100 300 4 4000
Test2 1 300 2 300

Any help is greatly appreciated. Thank you.

Corona688 · February 27, 2012, 11:52am

You want to grep three different lines, or one group of three lines?

awk '/^Line/ { print ; getline ; print; getline ; print }' filename

rtsiahaan · February 27, 2012, 11:58am

Great!.. almost there. I wanted to get in a row for each unique Line instead of separate new line.

Thanks.

Corona688 · February 27, 2012, 12:10pm

I think you need to explain what you want better. This will do what you asked for, but probably not what you want.

awk '# Get list of columns from first line
NR==1 { for(N=1; N<=NF; N++) { A[N]=$N } next }

# Add them as column prefixes to all other lines
{ for(N=1; N<=NF; N++) $N = A[$N] ": " $N; } 1' file

rtsiahaan · February 27, 2012, 12:26pm

I wanted to get a result that I can put to excel eventually.
Header---> LineName FSP LSP SR RL
Result----> Test1 100 300 4 4000
Result----> Test2 1 300 3 3000
and so on...

Corona688 · February 27, 2012, 1:01pm

I did get it exactly backwards then.

Your data is not consistent, some of it is "this: value", others are "this : value". Some has "A line", some has "Line". Which is true?

Also: Will columns always arrive in the same order, or not?

---------- Post updated at 12:01 PM ---------- Previous update was at 11:41 AM ----------

Assuming the order of columns isn't changing all the time:

$ cat data

Line: 100 has FSP: 100 with LSP: 300 and SR: 4 and RL: 4000
Line: 200 has FSP: 1 with LSP: 300 and SR: 3 and RL: 3000

$ cat rearrange.awk

BEGIN { FS="\t"; OFS="\t" }
{
        # Replace the various has/and/width and their spaces with a single tab.
        gsub(/ *(has|and|width) */, "\t");
        # Eliminate some more extra spaces.
        gsub(/ *: */, ":");

        # Now that we have them arranged in sane columns:
        PF=""
        for(N=1; N<=NF; N++)
        {
                # Split NR:1 into A[1]="NR", A[2]="1"
                split($N, A, ":");

                # Print row of titles if this is the first line
                if(NR == 1)
                {
                        printf(PF "%s", A[1]);
                        PF=FS;
                }

                # Set the column to the value and nothing else
                $N=A[2];
        }

        if(NR==1)       printf("\n");

        # Print the row of data for every single line
} 1

$ awk -f rearrange.awk data

Line    FSP     LSP     SR      RL
100     100     300     4       4000
200     1       300     3       3000

$

rtsiahaan · February 27, 2012, 2:41pm

My pattern to grep are essentially /Line:/ then print after Line: i.e 100, next pattern to grep /FSP:/ and print Value 100, next /LSP:/ print Value 300, /SR:/ print 4 & /RL:/ print 4000..then next line repeat. I don't really care for the number of spacing in between the result as long there are separation. Then upon final print I would echo title text : LINENAME FSP LSP SR RL followed by the results underneath the title would be my results from the pattern grep earlier.

Corona688 · February 27, 2012, 5:19pm

Does the code I already gave you work or not?

What would this 'results line' look like? What data would you want printed on it? How would you calculate this data from previous data?

Don't tell me how you want to do it, tell me what your data looks like and what your output looks like... You've posted 3 versions of what they might look like now, some of which still have obvious typos, and none of which agree with each other.

You also didn't answer any questions I actually asked...

rtsiahaan · February 28, 2012, 9:19am

Sorry about not answering your questions. Its works but did not output in a row as desired. I will post the outcome of the last script. Thanks.

---------- Post updated 02-28-12 at 08:19 AM ---------- Previous update was 02-27-12 at 11:11 PM ----------

Sorry about not answering your questions. Its works but did not output in a row as desired. I will post the outcome of the scripts provided. Thanks.

awk '/^Line:/ {print ; getline ; print; getline ; print}' log1.txt
Line: 100
FSP: 10
LSP: 200
Line: 200
FSP: 20
LSP: 300
Line: 300
FSP: 10
LSP: 100

more log1.txt

Line: 100
FSP: 10
LSP: 200

Line: 200
FSP: 20
LSP: 300

Line: 300
FSP: 10
LSP: 100

more log1.scr

BEGIN { FS="\t"; OFS="\t" }
{
        # Replace the various has/and/width and their spaces with a single tab.
        gsub(/ *(has|and|width) */, "\t");
        # Eliminate some more extra spaces.
        gsub(/ *: */, ":");

        # Now that we have them arranged in sane columns:
        PF=""
        for(N=1; N<=NF; N++)
        {
                # Split NR:1 into A[1]="NR", A[2]="1"
                split($N, A, ":");

                # Print row of titles if this is the first line
                if(NR == 1)
                {
                        printf(PF "%s", A[1]);
                        PF=FS;
                }

                # Set the column to the value and nothing else
                $N=A[2];
        }

        if(NR==1)       printf("\n");

        # Print the row of data for every single line
} 1

awk -f log1.scr log1.txt

Line
100
10
200

200
20
300

300
10
100

Corona688 · February 28, 2012, 10:37am

Your input data doesn't resemble anything you posted, then.

If you won't post any of it unmangled, can you at least attach some of it?

rtsiahaan · February 28, 2012, 11:31am

Thanks for your response.

Sorry I was just trying to simplified the log file. The actual log file is attached ( again truncated with 2 dataset only and I have >> 300 dataset). The key pattern I wanted to capture are line name,ORIGINAL TAPE INPUT, number of traces, sample rate, end time, first trace number , last trace number, first shotpoint and last shotpoint. The output desired is row for each dataset analysis.

line name = 76-145-294
ORIGINAL TAPE INPUT 856342
number of traces = 4812
trace header length = 240
number of samples = 2400
start time = 0 msec
end time = 4798 msec
sample rate = 2 msec
sample format = 1   (IBM Real)
trace number extracted from byte number 21, format Integer 4-Byte
  first trace number = 4155
  last trace number = 6375
  trace number increment = 5
shotpoint number extracted from byte number 17, format Integer 4-Byte
  first shotpoint = 424
  last shotpoint = 669
  shotpoint increment = 1
x coordinate extracted from byte number 73, format Integer 4-Byte
  first x coordinate =         65.00
  last x coordinate =        378.00
y coordinate extracted from byte number 77, format Integer 4-Byte
  first y coordinate =       3739.00
  last y coordinate =      57428.00
3D line number extracted from byte number 9, format Integer 4-Byte
  smallest 3D line value = 171001
  largest 3D line value = 174020
3D trace number extracted from byte number 21, format Integer 4-Byte
  smallest 3D trace value = 4155
  largest 3D trace value = 6375
minimum amplitude in file = -529.406
maximum amplitude in file = 529.375



line name = 76-145-294
ORIGINAL TAPE INPUT 856341
number of traces = 4800
trace header length = 240
number of samples = 2400
start time = 0 msec
end time = 4798 msec
sample rate = 2 msec
sample format = 1   (IBM Real)
trace number extracted from byte number 21, format Integer 4-Byte
  first trace number = 6380
  last trace number = 8695
  trace number increment = 5
shotpoint number extracted from byte number 17, format Integer 4-Byte
  first shotpoint = 638
  last shotpoint = 858
  shotpoint increment = -1
x coordinate extracted from byte number 73, format Integer 4-Byte
  first x coordinate =        340.00
  last x coordinate =        561.00
y coordinate extracted from byte number 77, format Integer 4-Byte
  first y coordinate =      50609.00
  last y coordinate =      98810.00
3D line number extracted from byte number 9, format Integer 4-Byte
  smallest 3D line value = 174005
  largest 3D line value = 177027
3D trace number extracted from byte number 21, format Integer 4-Byte
  smallest 3D trace value = 6380
  largest 3D trace value = 8695
minimum amplitude in file = -529.406
maximum amplitude in file = 529.375

Corona688 · February 28, 2012, 11:53am

This is indeed extremely different, since all the data you posted before had all the values on one line...

You still haven't explained the kind of summary you want after all the other data.

Working on it.

rtsiahaan · February 28, 2012, 12:20pm

The desired output would be as follows:

76-145-294 856342 4812 2 4798 4155 6375 424 669
76-145-294 856341 4800 2 4798 6380 8695 638 858

Thanks

Corona688 · February 28, 2012, 12:43pm

Unless I misunderstood you you also wanted some sort of summary after all lines are processed but gave no details.

$ cat tape.awk

BEGIN {
        FS=" = ";       OFS="\t"
        # Load the list of fields you want
        while(getline<"fields.txt")
        {
                FIELD[$1]=$2
                FIELD[++N]=$2
        }
        # Remove the two lines below to not print a header line
        for(N=1; FIELD[N]; N++) $N=FIELD[N]
        print
}

# Delete needless extra spaces
{ gsub(/[ \t][ \t]+/, "");      sub(/^ +/, ""); }

# Shoehorn that one funny line into the format of the rest
/ORIGINAL TAPE INPUT/ { sub(/INPUT/, "INPUT = ");       }

# If it matches one of the fields we want, grab it.
FIELD[$1] {
        T=FIELD[$1]
        split($2, A, " ");


        # If we already have one of those, print and start over
        if(V[T])
        {
                for(N=1; FIELD[N]; N++)
                {
                        $N=V[FIELD[N]];
                        delete V[FIELD[N]]
                }
                NF=(N-1)
                print
        }

        V[T]=A[1];
}

END {
        if(V[FIELD[1]])
        {
                for(N=1; FIELD[N]; N++) $N=V[FIELD[N]];
                print
        }
}

$ cat fields.txt

line name = LINE-NAME
ORIGINAL TAPE INPUT = OTI
number of traces = NOT
sample rate = SR
end time = ET
first trace number = FTN
last trace number = LTN
first shotpoint = FSP
last shotpoint = LSP

$ awk -f tape.awk data
LINE-NAME       OTI     NOT     SR      ET      FTN     LTN     FSP     LSP
76-145-294      856342  4812    2       4798    4155    6375    424     669
76-145-294      856341  4800    2       4798    6380    8695    638     858

$

Scrutinizer · February 28, 2012, 1:41pm

Alternatively, not very robust, but just for fun:

awk '{print $4,$8,$13,$37,$32,$59,$64,$83,$87}' RS= infile

rtsiahaan · February 28, 2012, 2:38pm

Corona688,
Awesome! Quite a complex scripting.
Once again thank you for your patience and help.