awk capturing first sample, but not subsequent id's

In the awk below the first sample MEV45 gets extracted from the html , but the subsequent MEV46 and MEV47 do not as they are not part of parse . I can not seem to add them to the code. Thank you very much @RudiC your awk is very nice :).

input

{"barcodeId": "IonXpress", "barcodedSamples": {"MEV45": {"barcodeSampleInfo": {"IonXpress_007": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_007"]}, "MEV46": {"barcodeSampleInfo": {"IonXpress_008": {"controlSequenceType": "", "description": "", "externalId": "", "hotSpotRegionBedFile": "", "nucleotideType": "DNA", "reference": "hg19", "targetRegionBedFile": "/results/uploads/BED/6/hg19/unmerged/detail/LCHv2_IDP.bed"}}, "barcodes": ["IonXpress_008"]}, "MEV47": {"barcodeSampleInfo": {"IonXpress_009": {"controlSequenceType": "",

current output

MEV45
IonXpress_007
IonXpress_008
IonXpress_009

desired output

MEV45
IonXpress_007
MEV46
IonXpress_008
MEV47
IonXpress_009

awk

awk -F"[]\":{}, ]*" '
BEGIN   {for (n=split ("barcodedSamples,barcodeSampleInfo", T); n>0; n--) SRCH[T[n]] = n
        }
        {for (i=1; i<NF; i++) if ($i in SRCH) print $(i+1)
        }

' input

How about

awk -F"[]\":{}, ]*" '
        {for (i=1; i<NF; i++) if ($i =="barcodeSampleInfo") print $(i-1) RS $(i+1)
        }

' file
MEV45
IonXpress_007
MEV46
IonXpress_008
MEV47
IonXpress_009
1 Like

Hello cmccabe,

Could you please try following and let me know if this helps you.

awk 'function remov(a){gsub(/[\{\":]/,X,a);print a} {if($0 ~ /MEV/){remov($0);getline;getline;remov($0);}}' RS=" "     Input_file

Output will be as follows.

MEV45
IonXpress_007
MEV46
IonXpress_008
MEV47
IonXpress_009

Thanks,
R. Singh

1 Like

Thank you both very much, works perfect :).