Format the Output

Hi-
Objective of the task is to print the lines which doesnt have a END statement corresponding to a START statement.

Let me know if anyone has a better way of doing is.

My Thoughts

Have 2 files one having START lines and another having END lines (sorted). And then diff the files to get the line which doesnt have an END statment.

Input


ABC;START
ABC;END
GHF;START
GHF;END
ABC;START
BHY;START
BHY;END


Output should be : ABC;START

since this doesnt have corresponding "ABC;END" line

Try...

awk '/END/{c=0}/START/{c++;if(c>1)print p;p=$0}' file1

please confirm at your end for this input

ABC;START
ABC;END
GHF;START
ABC;START
BHY;START

output:

# awk '/END/{c=0}/START/{c++;if(c>1)print p;p=$0}' file
GHF;START
ABC;START

Nice catch, simple to fix...

awk '/END/{c=0}/START/{c++;if(c>1)print p;p=$0}END{if(c>1)print p}' file1

how about this input:

ABC;START
GHF;START
GHF;END
BHY;START
BHY;END
ABC;END

output:

 # awk '/END/{c=0}/START/{c++;if(c>1)print p;p=$0}END{if(c>1)print p}' file
ABC;START

@OP, if you have Python and can use it, here's an alternative

d={}
for line in open("file"):
    first,second = line.strip().split(";")
    d.setdefault(first,[])
    d[first].append(second)
for i,j in d.iteritems():
    startcounts =j.count("START")
    endcounts = j.count("END")
    if startcounts != endcounts:
        print i,j

output:

# more file
ABC;START
GHF;START
BHY;START
BHY;END
ABC;END
# ./test.py
GHF ['START']

Well i think i should have explained more. The above code checks only for the START/END pattern. But the pattern prior is also important.

In the below case the awk will not return any rows. But it should have thrown as

ABC;START
BHY;START

Input

ABC;START
TTT;END
BHY;START
TTT;END

For every column1 ( ex: ABC ) there should be a START line and an END line ( END line need not necessary be the next line of START ) . If we dont have a END line for the column1 then we need to print that column1 (or the entire START line). Hope this explains.

Thanks for your inputs.

If you don't have Python, then try...

awk -F ';' '/END/{c[$1]=0}/START/{c[$1]++}END{for(i in c)if(c>0)print i}' file1

Not sure if you are looking for the imeediate START/STOP pattern:

awk -F ";" 'NR ==1 {prev=$1;st=$2;getline}{if ($1 == prev && $2 == "END"){getline;prev=$1;str=$2}else {print prev,str;prev=$1;str=$2};next}' file
awk -F ";" '{if ($1 == prev && $2 == "END"){getline;prev=$1;str=$2}else {print prev,str;prev=$1;str=$2};next}' file

cheers,
Devaraj Takhellambam