Find a pattern and traverse left and pick something from another pattern

I have a Text like below ,

Detailed Table Information      Table(tableName:a1, dbName:default, owner:eedc_hdp_s_d-itm-e, createTime:1520514151, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:columna, type:string, comment:this is first column which has comment), FieldSchema(name:columnb, type:int, comment:null)], location:hdfs://DBDP-Dev/apps/hive/warehouse/a1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, rawDataSize=0, numRows=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1520514151}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)

from the text i need to

1) Check whether we have comment:null
2) if yes identify the parent table of the column identified by string "Table(tableName:" and a1 is tablename . Then identify the Columnname for which comemnt is null that is columnb from this string "FieldSchema(name:columnb, type:int, comment:null)"
3) Once i have both column name and tablename print

There is no comment for Column Columnname for table Tablename

as output where both Colmnname and Tablenames are values we found in step2

Can you please help to achieve this.

Note:I have a linux distribution with awk version GNU Awk 3.1.8 and using Bash shell

How would you apply / adapt what you have learned from your other threads to this new but not that different problem?

Hint: if you use ) , ( or , as your record separator and : as your field separator you are close to your solution:

$ awk -F: 'NF>1{print $1 "->" $2}' RS="[)(,]" infile | grep -E "[Nn]ame|comment"
tableName->a1
 dbName->default
name->columna
 comment->this is first column which has comment
name->columnb
 comment->null
name->null
skewedColNames->[]

thanks RudiC and Chubler_XL for the directions.

I am working on the problem. I have picked up Arnold Robbins book on AWK and trying to see how can i solve this .

Hopefully i can have something to show in another couple of days .

The problem might be a bit too complex to solve with a text book in hand. There will be several different approaches to the solution; this may give you a starting point:

awk -F, '
match ($0, /tableName:.*comment:null[^]]*[]]/)          {$0 = substr ($0, RSTART, RLENGTH)
                                                         printf "There is no comment for Column "
                                                         gsub (/FieldSchema\(name:/, "\n")
                                                         n = split ($0, T, "\n")
                                                         for (i=2; i<=n; i++) if (T ~ /comment:null/)        {printf "%s%s", DL, substr (T, 1, index (T, ",") - 1)
                                                                                                                 DL = ","
                                                                                                                }
                                                         printf " for table %s\n", substr ($0, 11, index($0, ",") - 11)
                                                        }
' file
There is no comment for Column columnb for table a1

It takes care of several uncommented columns in a table already.

cat nv186000.file
Detailed Table Information Table(tableName:a1, dbName:default, owner:eedc_hdp_s_d-itm-e, createTime:1520514151, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:columna, type:string, comment:this is first column which has comment), FieldSchema(name:columnb, type:int, comment:null)], location:hdfs://DBDP-Dev/apps/hive/warehouse/a1, inputFormatLinuxrg.apache.hadoop.mapred.TextInputFormat, outputFormatLinuxrg.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLibLinuxrg.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, rawDataSize=0, numRows=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1520514151}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Detailed Table Information Table(tableName:a2, dbName:default,
Testing filler FieldSchema(name:columnA, type:string, comment:null)
Detailed Table Information Table(tableName:a2, dbName:default, owner:eedc_hdp_s_d-itm-e, createTime:1520514151, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:columnA, type:string, comment:null), FieldSchema(name:columnB, type:int, comment:null)], location:hdfs://DBDP-Dev/apps/hive/warehouse/a1, inputFormatLinuxrg.apache.hadoop.mapred.TextInputFormat, outputFormatLinuxrg.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLibLinuxrg.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, rawDataSize=0, numRows=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1520514151}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
 perl -nle '($t) = /tableName:(\w+)/ or next; while(/name:(\w+),\s?type:\w+,\s?comment:null/g){print "There is no comment for Column $1 for table $t"}' nv186000.file

Output:

There is no comment for Column columnb for table a1
There is no comment for Column columnA for table a2
There is no comment for Column columnB for table a2

Thank you Aia , RudiC.

RudiC,

You are right , this solution could not have been done by me in a short span with a book.

But i do see a pattern here in the solutions and can focus on those .

thanks again :b: