Need help with sed/awk command

linuxUser1 · October 20, 2014, 12:58pm

Dear all,

I have a file named as fileName with following entities,

functions
{
    planeDictName
    {
        type            surfaces;
        functionObjectLibs ( "libsampling.so" );
        outputControl   timeStep;
        surfaceFormat   vtk;
        fields          ( p U );
        interpolationScheme cellPoint;
        surfaces
        (
            planeName
            {
                type plane;
                    basePoint (0.0 0.0 0.025);
                    normalVector (0 0 1);
                triangulate false;
                interpolate true;
            }
        );
    }


    planeDictName2
    {
        type            surfaces;
        functionObjectLibs ( "libsampling.so" );
        outputControl   timeStep;
        surfaceFormat   vtk;
        fields          ( p U );
        interpolationScheme cellPoint;
        surfaces
        (
            planeName
            {
                type plane;
                    basePoint (0.0 0.0 0.075);
                    normalVector (0 0 1);
                triangulate false;
                interpolate true;
            }
        );
    }
}

Entities in red colour are user defined names. I want to read these names using awk, grep or sed commands and store in shell variables.

Can somebody help me?

Thanks & Regards,
linuxUser_

Corona688 · October 20, 2014, 1:08pm

What have you tried?

linuxUser1 · October 20, 2014, 1:33pm

too tough question.
I don't see any condition to read the parameters in this format except functions :-o.
There is no rule to provide it like this

functions {     planeDictName     {

it can be something like this as well.

functions {





    planeDictName     {

string between two limiters { and } in this case will not work but it may be done as follows

sed 's/.*{\(.*\)}\.*/\1/' file

but I am still dumb in my problem.

Thanks & Regards,
linuxUser_

---------- Post updated at 11:03 PM ---------- Previous update was at 10:54 PM ----------

Ideal way for the solution may be,

create a string of length n, (2 in this case as only 2-plane's data needs to be written).

shell parameters:

dictName[1] = planeDictName
dictName[2] = planeDictName2
:
:
dictName[n] = .... dicts as many as present.

planeName[1] = planeDictName
planeName[2] = planeDictName2
:
:
planeName[n] = .... planes as many as present.

and fields .... there is no limit for number of fields...

1st question,
Is shell script is good choice for this problem?

Corona688 · October 20, 2014, 1:37pm

Well, what you're asking for isn't trivial. There's not a one-liner I can wave at you to fix it, especially when you point out that your data isn't "pretty" the way you presented it. Neither awk nor sed nor most commandline tools are really suited for parsing recursive grammar. You have to chew through it character by character.

Do you have a C compiler?

linuxUser1 · October 20, 2014, 1:45pm

Yes, gcc

RudiC · October 20, 2014, 2:02pm

I guess the regex [Pp]lane.*[Nn]ame.* (although ideal for the eample given) won't work for the general case. Try to describe in plain English WHAT you want to extract.

---------- Post updated at 20:02 ---------- Previous update was at 19:55 ----------

Maybe a first step :

awk '{CNT=CNT + gsub (/{/,"") - gsub(/}/,""); if (CNT==1 && !/^ *$/) print}' file
    planeDictName
    planeDictName2

linuxUser1 · October 20, 2014, 2:08pm

in functions{}, there will be many sub-dicts.
one of them is

    planeDictName
     {
         type            surfaces;
         functionObjectLibs ( "libsampling.so" );
         outputControl   timeStep;
         surfaceFormat   vtk;
         fields          ( p U );
         interpolationScheme cellPoint;
         surfaces
         (
             planeName
             {                 type plane;
                     basePoint (0.0 0.0 0.025);
                     normalVector (0 0 1);
                 triangulate false;
                 interpolate true;
             }
         );
     }

in this I want to store names(in red colour) in shell variables

for this case, consider shell variables
dictName, planeName, fieldNames

dictName = planeDictName
planeName = planeName
fieldNames[1] = p
fielsNames[2] = U

Corona688 · October 20, 2014, 2:36pm

Another question. If these name are user defined, how will I know I'm in the right structure and not the wrong one, if not by the name?

linuxUser1 · October 20, 2014, 2:40pm

It must be a continuous string contains a-z and 0-9, else application will not work

typical names as follows,
sampleData1
dataForPlane1
etc..,

RudiC · October 20, 2014, 2:51pm

sampleData1 is more than a-z 0-9 .

linuxUser1 · October 20, 2014, 2:53pm

you mean to say D is not belongs to it????
sorry please add A-Z as well

RudiC · October 20, 2014, 3:00pm

Try

awk     '                       {CNT=CNT+gsub(/{/,"")-gsub(/}/,""); if (CNT==1 && !/^ *$/) print "dictname[" ++dc "]=" $0}
         /fields/               {for (i=3; i<NF; i++) print "fieldname[" ++fc "]="$i}
         /^ *surfaces */        {S=1; next}
         S                      {gsub(/[^0-9A-Za-z]*/, ""); if (!/^ *$/) {print "planename[" ++pc "]=" $0; S=0}}
        ' file
dictname[1]=    planeDictName
fieldname[1]=p
fieldname[2]=U
planename[1]=planeName
dictname[2]=    planeDictName2
fieldname[3]=p
fieldname[4]=U
planename[2]=planeName

linuxUser1 · October 20, 2014, 3:09pm

thanks a lot... its working fine up to this

I want to add one more limiter to avoid any other entities with {} as current script search for strings inside {}.
Ideal condition is functions{ "read here only "}

RudiC · October 20, 2014, 3:27pm

Use the "surfaces" line as an example to implement your own solution.

linuxUser1 · October 21, 2014, 2:35am

one more issue:

fields have no limit.
I mean lets say in dict1 fields are p, U
in dict2 may be T will be the field

all I mean to say is, Instead of saving field names as continuous string, dict[1].fieldName[1] = p, dict[1].fieldName[2] = U etc..
will be more unique

---------- Post updated at 12:05 PM ---------- Previous update was at 01:30 AM ----------

Can you explain this condition?

if (CNT==1 && !/^ *$/)

RudiC · October 21, 2014, 6:29am

Why don't you give it a try? Every fieldname will have its own array element.
EDIT: Oh, got you now. shells don't have those structures. Recent shells with associative arrays might allow for an approach comig close...

CNT represents the "level" of "{...}" nestings. So, if the level is 1 deep, and if there's more than an "empty" (nothing but spaces) line, print that. This is, of course, heavily depending on the structure of your file. If all the info were written in a single line, that logic would be doomed.

linuxUser1 · October 21, 2014, 6:37am

#!/bin/bash
declare -a dictName;
declare -a planeName;
declare -a fieldName;
dCount=0;
pCount=0;
fCount=0;

awk     '/^ *functions */       {F=1; next}
         F                      {CNT=CNT+gsub(/{/,"")-gsub(/}/,""); if (CNT==1 && !/^ *$/) {dictName[dCount]=$0; dCount=$((dCount+1))} }
         /fields/               {for (i=3; i<NF; i++) {fieldName[fCount]=$i; fCount=$((fCount+1))}}
         /^ *surfaces */        {S=1; next}
         S                      {gsub(/[^0-9A-Za-z]*/, ""); if (!/^ *$/) {planeName[pCount]=$i; pCount=$((pCount+1)); S=0}}
        ' file

Hi, will it work something like this?(as shown above?)

I am able to get the names wat eva I want but unable to store in a string array.

RudiC · October 21, 2014, 6:51am

I don't think that will work. You can't use shell variables inside an awk script. There is a mechanism to pass variables (cf. man awk), but you won't get back any values into variables except by command substitution. On top, you stopped using that F logical variable half way. Try like second line

!F {next}

and then, somewhere reasonable, sth containing

!CNT {exit}

This may fail if the input file structure is different from the one you presented.

You could try printing all the assignments to a file and then source that file from your shell. Or, in recent shells with "process substitution" sth like

. <(awk 'BEGIN {print "X=17"}')
echo $X
17

linuxUser1 · October 21, 2014, 7:21am

OMG... I see lot more stuffs to understand to finish this job

Can you give me a example storing a variable like this

declare -a dictName
j=0
while(some condition that satisfies my requirement-for looping)
dictName[j] <(awk '/^ *functions */       {F=1; next}
         F                      {CNT=CNT+gsub(/{/,"")-gsub(/}/,""); if (CNT==1 && !/^ *$/) {dictName[dCount]=$0; dCount=$((dCount+1))} }')

Thanks and Regards,
linuxUser_

---------- Post updated at 04:51 PM ---------- Previous update was at 04:49 PM ----------

One more thing, why is that space before the dictNames? when printing?
Can I remove that space ???

dictname[1]=[SPACE]planeDictName

RudiC · October 21, 2014, 7:28am

As I said, awk does NOT use shell variables like that, nor the $((...+1)) shell construct. As you are using bash, the process substitution might work. If

awk '                       {CNT=CNT+gsub(/{/,"")-gsub(/}/,"");
                                 if (CNT==1 && !/^ *$/) {gsub (/ /,_); print "dictname[" ++dc "]=" $0}
                                }
         /fields/               {for (i=3; i<NF; i++) print "fieldname[" ++fc "]="$i}
         /^ *surfaces */        {S=1; next}
         S                      {gsub(/[^0-9A-Za-z]*/, ""); if (!/^ *$/) {print "planename[" ++pc "]=" $0; S=0}}
        ' file

produces

dictname[1]=planeDictName
fieldname[1]=p
fieldname[2]=U
planename[1]=planeName
dictname[2]=planeDictName2
fieldname[3]=p
fieldname[4]=U
fieldname[5]=T
planename[2]=planeName

, sourcing the process substitution

. <(awk    '            {CNT=CNT+... )

will assign all those variables:

echo ${dictname[@]} ${planename[@]} ${fieldname[@]} 
planeDictName planeDictName2 planeName planeName p U p U T

But - be careful with that sourcing, as every malicious result will be executed as well!