Shell script to merge the lines between two patterns and print the entire file

Hi All ,

I am having a CSV file as stated below
File1

Module Statements,Module,Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal,A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.,6.4.5.a),
Module Statements,,,A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement,6.4.5.b),
Module Statements,,,"The following objects shall have unique names within a module_def:
port_name
? scanRegister_name
? dataRegister_name
? one_hot_scan_group_name
? scanMux_name
? dataMux_name
? clockMux_name
? one_hot_data_group_name
? logicSignal_name
? alias_name",6.4.5.c),
Module Statements,,,"The following objects shall have unique names within a module_def:
instance_name
? scanInterface_name",6.4.5.d),
Module Statements,,,"An inputPort_name shall be one of the following
scanInPort_name
? shiftEnPort_name
? captureEnPort_name
? updateEnPort_name
? dataInPort_name
? selectPort_name
? resetPort_name
? tmsPort_name
? tckPort_name
? clockPort_name
? trstPort_name
? addressPort_name
? writeEnPort_name
? readEnPort_name
",6.4.5.e),
Module Statements,,,"An outputPort_name shall be one of the following:
? scanOutPort_name
? dataOutPort_name
? toShiftEnPort_name
? toUpdateEnPort_name
? toCaptureEnPort_name
? toSelectPort_name
? toResetPort_name
? toTckPort_name
? toTmsPort_name
? toClockPort_name
? toTrstPort_name
? toIRSelectPort_name",6.4.5.f),
Module Statements,,,Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.g),
Module Statements,,,Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.h),

Output file : Search patterns
Pattern1 : Module Statements
Pattern 2 : 6.*

Output file

Module Statements,Module,Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal,A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.,6.4.5.a),
Module Statements,,,A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement,6.4.5.b),
Module Statements,,,"The following objects shall have unique names within a module_def: port_name ? scanRegister_name ? dataRegister_name ? one_hot_scan_group_name ? scanMux_name ? dataMux_name ? clockMux_name
? one_hot_data_group_name ? logicSignal_name ? alias_name",6.4.5.c),
Module Statements,,,"The following objects shall have unique names within a module_def: instance_name ? scanInterface_name",6.4.5.d),
Module Statements,,,"An inputPort_name shall be one of the following scanInPort_name ? shiftEnPort_name ? captureEnPort_name ? updateEnPort_name
? dataInPort_name ? selectPort_name ? resetPort_name ? tmsPort_name
? tckPort_name ? clockPort_name ? trstPort_name ? addressPort_name
? writeEnPort_name ? readEnPort_name ",6.4.5.e),
Module Statements,,,"An outputPort_name shall be one of the following:
? scanOutPort_name ? dataOutPort_name ? toShiftEnPort_name ?
toUpdateEnPort_name ? toCaptureEnPort_name ? toSelectPort_name
? toResetPort_name ? toTckPort_name ? toTmsPort_name ? toClockPort_name
? toTrstPort_name ? toIRSelectPort_name",6.4.5.f),
Module Statements,,,Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.g),
Module Statements,,,Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.h),

I tried to use below stated code

awk '/Module State/{a=1;next}/6.4/{a=0}1' FILE.csv

Thanks and Regards
Kshitij Kulshreshtha

I think you want to use the a as a state variable, and act according to its state.
First, it must be 6\.4 in a regular expression, because an unescaped dot means "any character".
The action is missing.
If it's to append the current line, then it could be

awk 'a{buf=(buf FS $0)} /Module State/{a=1; buf=$0} /6\.4/{a=0; print buf}' FILE.csv

I use a buf variable:

  • if a == 1 then add the current line to buf with a leading space (append)
  • if Module State is met then set a to 1 and store the line in buf
  • if 6.4 is met then set a to 0 and print buf

In this case it's doable without buffering:

awk 'a{printf " %s", $0} /Module State/{a=1; printf "%s", $0} /6\.4/{a=0; printf "\n"}' FILE.csv
  • if a == 1 then print the current line with a leading space and no ending newline (append)
  • if Module State is met then set a to 1 and print the current line without a newline
  • if 6.4 is met then set a to 0 and print an ending newline

I see opening and closing quotes, so maybe you just need my tip

Thanks a lot for the nice solution , it worked,

Now I am having a requirement to create perl script for the same issue
I developed something like this but its not giving me the right solution

Perl script

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;
use feature 'say';

my @collect;
my $file = $ARGV[0] or die;
open(my $DATA, '<', $file) or die;

while (<$DATA>) {
chomp;
# If we're between our markers...
if (/Module Statements/ .. /6.4*/) {
# At the start marker, empty the array
if (/Module Statements/) {
@collect = ();
# At the end marker, print the array
} elsif (/6.4*/) {
say join ' ', @collect;
# Otherwise, push the line onto the array
} else {
push @collect, $_;
foreach my $m (@collect) {
print $m;
}
}
# Otherwise, just print the line
} else {
say;
}
}

Thanks and Regards
Kshitij Kulshreshtha

Why do you ignore my hint?

You can directly translate awk code to perl -lne code. Both loop around the input file.
But if you prefer an explicit while loop...

I wonder why?
What's wrong with awk?
But either way, consider a2p.

1 Like

Didn't know about the a2p. It's written in C not in perl. Anyway, good to know.
For the exercise, do a manual translation, and you learn awk and perl!

Apropos perl,
I am suprised the .. range operator works in a while loop.
For each iteration the $_ is just the current line; obviously the .. maintains an invisible state variable.

Regarding your perl code:

  • the for loop should not be there.
  • it misses printing the Module Statements and the 6.4 because it is in an else branch.
  • it misses printing the 6.4 if on the same line as the Module Statements, because it is in an elsif branch.

Your perl code corrected:

use warnings;
use Data::Dumper;
use feature 'say';
 
my @collect;
my $file = $ARGV[0] or die;
open(my $DATA, '<', $file) or die;
 
while (<$DATA>) {
  chomp;
  # If we're between our markers...
  if (/Module Statements/ .. /6\.[0-9]/) {
    # At the start marker, empty the array
    if (/Module Statements/) {
      @collect = ();
    }
    # Always(!) push the line onto the array
    push @collect, $_;
    # At the end marker, print the array
    if (/6\.[0-9]/) {
      say join ' ', @collect;
    }
  # Otherwise, just print the line
  } else {
    say;
  }
}

The border case, Module Statements and 6.4 are on the same line:
it clears the array, then adds the line to it, then prints it.

1 Like

This topic was automatically closed 300 days after the last reply. New replies are no longer allowed.