Awk match a multiline pattern

Hello!

i wanna match in a config file, one text with more than one lines, something like this:

CACHE_SIZE{
10000 M
}

I have problems with the ends of line, i think that i can match the end of the line with \n, but i can't get it

Someone can help me with the regular expression?

Thanks!

Not sure if I undestood what you need but here is an example with a preceding block and a block afterwards, separated by a blank line:

$> cat infile
eins
zwei
drei

CACHE_SIZE{
10000 M
}

vier
fuenf
sechs
$> awk '/^CACHE_SIZE/ {print}' FS="\n" RS="" infile
CACHE_SIZE{
10000 M
}

Sorry, but i can use FS="\n" because my file have 3 differents patterns, and i need to use FS="\t". My file is something like this:

123M     xls     <670K
234K     doc    >800K

CACHE_SIZE{
1000M
}

Then i have a regexp that matches the first pattern, because this i need FS="\t" and need to match the second

CACHE_SIZE{
1000M
}

Separate the processing of your file into two different runs?

Or check if your line starts with something like your CACHE_SIZE{ and let it do getline until there is a closing curled bracket or work with a flag to notice when the bracket opens and when it closes. So you will not need to change FS or RS for that part.

No, i don't want separate the processing of my file in 2 diferents runs, i wanna run first a regexp, and this retunrs the values to me and then run the second regexp but continuosly not in 2 diferents runs.

Then maybe go with the second suggestion I mentioned.

Ok, thanks!

I'll try to search information about getline works.

Try this:

sed -n '/^CACHE_SIZE/  N;/\n10000 M/ N;/\n}/p; ' filename

Thanks but i need to use awk dennis, and i think that i can't use getline zaxxon, because i have a config file like this:

CACHE SIZE{
cache_size=1000M
}

CRONTAB{
crontab=1 12 2 10 5
}

POLITICS{
100MF   doc,docx,xls    <5600K  02/04/02/02/02K
55MF    jpg     >300M   03/03/02/03/05K
65%F    mpg     =000M   02/02/02/05/05M
}

3 differents structures in my config file and besides this, i'm reading this config file with an awk script "read_conf.awk" like this:

BEGIN {FS = "\t"
print "Starting to read the config file"
num=0;
bucle=0;

}
# first regexp
/^[0-9]*[\%|M|K|G][C|F]\t[A-Za-z|\,]*\t[\<|\>|\=][0-9]*[K|M|G]\t[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9][M|K|G]$/     {

instructions...

}

#second regexp
/^cache_size=[0-9]*[K|M|G]/     {

instructions...

}

#third regexp
/crontab=[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*/ {

instructions...

}

Everyone of this regexp read the values between {...}, but i need to read
with the head and the end, in this case: "POLITICS{" and "}"
The structure "head", "regexp", "end"

POLITICS{
100MF   doc,docx,xls    <5600K  02/04/02/02/02K
55MF    jpg     >300M   03/03/02/03/05K
65%F    mpg     =000M   02/02/02/05/05M
}

I explain me correctly?

Thanks

Try this

infile: -

123M     xls     <670K
234K     doc    >800K
 
dummy{
10N
}
 
CACHE_SIZE{
1000M
}
nawk ' /CACHE_SIZE{/{bo = 1}(/}/)&&(bo){print;exit}bo ' infile

Gives output: -

CACHE_SIZE{
1000M
}

Hope it helps :slight_smile:

---------- Post updated at 10:17 PM ---------- Previous update was at 08:22 PM ----------

Is this what you need?

TX5XN:/home/brad/forum/claw82>cat infile
CACHE SIZE{
cache_size=1000M
}

CRONTAB{
crontab=1 12 2 10 5
}

POLITICS{
100MF   doc,docx,xls    <5600K  02/04/02/02/02K
55MF    jpg     >300M   03/03/02/03/05K
65%F    mpg     =000M   02/02/02/05/05M
}


TX5XN:/home/brad/forum/claw82>nawk ' /CACHE SIZE{/,/}/;/CRONTAB{/,/}/;/POLITICS{/,/}/ ' infile

CACHE SIZE{
cache_size=1000M
}
CRONTAB{
crontab=1 12 2 10 5
}
POLITICS{
100MF   doc,docx,xls    <5600K  02/04/02/02/02K
55MF    jpg     >300M   03/03/02/03/05K
65%F    mpg     =000M   02/02/02/05/05M
}

---------- Post updated 14-10-09 at 10:04 AM ---------- Previous update was 13-10-09 at 10:17 PM ----------

Looking at your awk script I think you probably need to edit it to something like this: -

nawk ' BEGIN {FS = "\t"        print "Starting to read the config file"        num=0;        bucle=0;} /POLITICS{/,/}/{        print        if( $0 ~ /^[0-9]*[\%|M|K|G][C|F]\t[A-Za-z|\,]*\t[\|\=][0-9]*[K|M|G]\t[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9][M|K|G]$/ ) {                print "Instructions for politics"        }}  /CACHE SIZE{/,/}/{        print        if($0 ~ /^cache_size=[0-9]*[K|M|G]/ ) {                print "Instructions for cache size"        }} /CRONTAB{/,/}/{        print        if( $0 ~ /crontab=[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*/ ) {                print "Instructions for crontab"        }} ' infile

This gives the output: -

Starting to read the config fileCACHE SIZE{cache_size=1000MInstructions for cache size}CRONTAB{crontab=1 12 2 10 5Instructions for crontab}POLITICS{100MF   doc,docx,xls    300M   03/03/02/03/05K65%F    mpg     =000M   02/02/02/05/05M}

Hope that looks ok I cannot get the web form to enter the code tags for me so am doing them from memory.Please note that your pattern for the POLITICS stuff does not seem to work.Cheers

---------- Post updated at 10:07 AM ---------- Previous update was at 10:04 AM ----------

I can't get the formating to work, will have to wait till I get home tonight before I can post this properly. :frowning:

In the third part i think that you understood that i wanna do, but i think it's better for me to work with your seconds answer, because i only need to print or save in variables the values between {...}, but controlling that they are headed with something like CACHE_SIZE{

What means /,/ in your code?

Still not 100% sure what you want as you have not given a complete output but take a look at this and tell me how close we are: -

OUTPUT: -

Starting to read the config file
CACHE SIZE{
cache_size=1000M
Instructions for cache
}
CRONTAB{
crontab=1 12 2 10 5
Instructions for crontab
}
POLITICS{
100MF   doc,docx,xls    <5600K  02/04/02/02/02K
55MF    jpg     >300M   03/03/02/03/05K
65%F    mpg     =000M   02/02/02/05/05M
}

CODE: -

TX5XN:/home/brad/forum/claw82>cat claw
nawk '

BEGIN {FS = "\t"
    print "Starting to read the config file"
    num=0
    bucle=0
}


/CACHE SIZE{/,/}/ {
    print
    if(  $0 ~ /^cache_size=[0-9]*[K|M|G]/ ){
        print "Instructions for cache"
    }
}

/CRONTAB{/,/}/ {
    print
    if ( $0 ~ /crontab=[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*/ ) {
        print "Instructions for crontab"
    }
}

/POLITICS{/,/}/ {
    print
    if ( $0 ~ /^[0-9]*[\%|M|K|G][C|F]\t[A-Za-z|\,]*\t[\<|\>|\=][0-9]*[K|M|G]\t[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9][M|K|G]$/ ) {
        print "Instructions for politics"
    }
}

' infile

The expression: -

/CACHE SIZE{/,/}/

Means read from the pattern /CACHE SIZE{/ to the pattern /}/ as I learned yesterday on this forum. Within that range of lines we then have an if statement matching the patterns you supplied, although as I said one of them does not seem to work.

If this is not what you are looking for then copy my output above and change it to show what you are trying to achieve exactly.

Cheers

Ok, my output must be something like this:

for CACHE_SIZE:
1000M

for CRONTAB:
1 12 2 10 5

for POLITICS:
100MF   doc,docx,xls    <5600K  02/04/02/02/02K
55MF    jpg     >300M   03/03/02/03/05K
65%F    mpg     =000M   02/02/02/05/05M

This 3 outputs will be in 3 differents executions or readings with my awk script.

I think maybe with this "/CACHE SIZE{/ to the pattern /}/" i can read the values between the {...} and apply another regexp for know if the content matches with this regexp, what do you think about?

Thanks

This gives the output you suggest.

I have marked the places you should insert your code, they are inside the if statements containing the regular expressions you posted.

nawk '

BEGIN {FS = "\t"
    printf("%s\n\n", "Starting to read the config file")
    num=0
    bucle=0

}


/CACHE SIZE{/,/}/ {
    if ( $0 ~ /CACHE SIZE{/ ){
        printf("%s\n", "for CACHE SIZE:")
        next
    }
    if( $0 ~ /}/)
        next
    split($0, a, "=")
    printf("%s\n", a[2])
    if(  $0 ~ /^cache_size=[0-9]*[K|M|G]/ ){

        ## YOUR CODE HERE
        printf("%s\n\n", "Instructions for cache")
    }
}

/CRONTAB{/,/}/ {
    if ( $0 ~ /CRONTAB{/ ){
        printf("%s\n", "for CRONTAB:")
        next
    }
    if( $0 ~ /}/)
        next
    split($0, a, "=")
    printf("%s\n", a[2])
    if ( $0 ~ /crontab=[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*\t[0-9]*|\*/ ) {

        ## YOUR CODE HERE
        printf("%s\n\n", "Instructions for crontab")
    }
}

/POLITICS{/,/}/ {
    if ( $0 ~ /POLITICS{/ ){
        printf("%s\n", "for POLITICS:")
        next
    }
    if( $0 ~ /}/)
        next
    print
    if ( $0 ~ /^[0-9]*[\%|M|K|G][C|F]\t[A-Za-z|\,]*\t[\<|\>|\=][0-9]*[K|M|G]\t[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9][M|K|G]$/ ) {

        ## YOUR CODE HERE
        print "Instructions for politics"
    }
}

' infile

Are you proved this code?

I don't know why, but in the POLITICS, my regexp doesn't work, the code don't show me the politics.

---------- Post updated at 12:34 PM ---------- Previous update was at 11:50 AM ----------

Maybe the problem is because i'm using awk, not nawk?

I found it did not work either, but I will leave you to figure out why. It is in the right place though, you just need to get the syntax right.I suggest cutting the expression right back to basics as it probably does not need to be that complex now that the range I defined is getting you into the right area of the file.Good luck

I can't cut the expression, i need the maximum complexity in the expression, because the user can write fine the first part of the expression and bad the second part, and the expression mustn't match.

And i don't know why the expression doesn't works, because y have this expression out of ranges and it works.

I see problems with spaces and tabs.

Thanks!

I mean break the regular expression down into 3 seperate expressions, one for each line and build each slowly: -

/POLITICS{/,/}/ {
    if ( $0 ~ /POLITICS{/ ){
        printf("%s\n", "for POLITICS:")
        next
    }
    if( $0 ~ /}/)
        next
    print
  
    if ( $0 ~ /^[0-9]+[%]+[MKGCF]+[\t]+/){
        ## 65%F    mpg     =000M   02/02/02/05/05M
        print "Instructions for 65%"
    }
}

So you end up with three if statements each with a smaller expression, one for each line.

OP for above: -

for POLITICS:
100MF   doc,docx,xls    <5600K  02/04/02/02/02K
55MF    jpg     >300M   03/03/02/03/05K
65%F mpg =000M 02/02/02/05/05M
Instructions for 65%

Always break the work down into manageable pieces

This subject is now closed........ :frowning:

Ok, i understand! i have an error in the *

Thanks so much!!!