awk for text processing

Hi,my file is in this format

"[ {
  \"_id\": \"56190\",
  \"_score\": 1.0,
  \"generif\": [
    {
      \"pubmed\": 21764855,
      \"text\": \"loss of RNPC1 in mouse embryonic fibroblasts increased the level of p53 protein, leading to enhanced premature senescence in a p53-dependent manner\"
    },
    {
      \"pubmed\": 22371495,
      \"text\": \"a novel mechanism by which HuR is regulated by RNPC1 via mRNA stability and HuR is a mediator of RNPC1-induced growth suppression.\"
    },
    {
      \"pubmed\": 22508983,
      \"text\": \"knockdown of p73 or p21, another target of RNPC1, attenuates the inhibitory effect of RNPC1 on cell proliferation and premature senescence, whereas combined knockdown of p73 and p21 completely abolishes it\"
    },
    {
      \"pubmed\": 23836903,
      \"text\": \"knockdown of MIC-1 can decrease RNPC1-induced cell growth suppression.\"
    },
    {
      \"pubmed\": 25512531,
      \"text\": \"Rbm38 deficiency markedly decreases the tumor penetrance in mice heterozygous for p53 via enhanced p53 expression.\"
    },
    {
      \"pubmed\": 28850611,
      \"text\": \"The hearts of Rbm38 -/- mice were mildly hypertrophic, but cardiac function was not affected. Furthermore, Rbm38 deficiency did not affect cardiac remodeling (i.e. hypertrophy, LV dilation and fibrosis) or performance (i.e. fractional shortening) after pressure-overload induced by transverse aorta constriction.\"
    }
  ],
  \"symbol\": \"Rbm38\"
} ]"

I want to convert it to a more user readable format

_id pubmed  text  symbol    
67196 18667844  Overexpression of UBE2T in NIH3T3 cells significantly promoted colony formation in mouse cell cultures  Ube2t
56190 21764855  loss of RNPC1 in mouse embryonic fibroblasts increased the level of p53 protein, leading to enhanced premature senescence in a p53-dependent manner Rbm38
56190 22371495  a novel mechanism by which HuR is regulated by RNPC1 via mRNA stability and HuR is a mediator of RNPC1-induced growth suppression Rbm38
56190 22508983  knockdown of p73 or p21, another target of RNPC1, attenuates the inhibitory effect of RNPC1 on cell proliferation and premature senescence, whereas combined knockdown of p73 and p21 completely abolishes it  Rbm38
56190 23836903  knockdown of MIC-1 can decrease RNPC1-induced cell growth suppression.  Rbm38
56190 25512531  Rbm38 deficiency markedly decreases the tumor penetrance in mice heterozygous for p53 via enhanced p53 expression Rbm38
56190 28850611  The hearts of Rbm38 -/- mice were mildly hypertrophic, but cardiac function was not affected. Furthermore, Rbm38 deficiency did not affect cardiac remodeling (i.e. hypertrophy, LV dilation and fibrosis) or performance (i.e. fractional shortening) after pressure-overload induced by transverse aorta constriction. Rbm38


Welcome to the forum.

Any attempts / ideas / thoughts from your side?

Where does the 67196 18667844 info come from?

I am not sure how to edit my post. I have mistakenly given that. please read the expected outcome from second line on wards. I was just trying something like below to get what i wanted

sed 's/:/\t/' t.txt | awk '{ gsub(/\[/,"") }1'

How about

awk -F: '
                {gsub (/[\\",]/, _)
                }
/^ *_id/        {ID = $2
                }
/^ *pubmed/     {PM[++CUR] = $2
                }
/^ *text/       {TX[CUR] = $2
                }
/^ *symbol/     {SY = $2
                }
/^ *\} \]/      {for (p in PM) print ID, PM[p], TX[p], SY
                 CUR = 0
                 split ("", PM)
                }
' file
 56190  21764855  loss of RNPC1 in mouse embryonic fibroblasts increased the level of p53 protein leading to enhanced premature senescence in a p53-dependent manner  Rbm38
 56190  22371495  a novel mechanism by which HuR is regulated by RNPC1 via mRNA stability and HuR is a mediator of RNPC1-induced growth suppression.  Rbm38
 56190  22508983  knockdown of p73 or p21 another target of RNPC1 attenuates the inhibitory effect of RNPC1 on cell proliferation and . . .   Rbm38
 56190  23836903  knockdown of MIC-1 can decrease RNPC1-induced cell growth suppression.  Rbm38
 56190  25512531  Rbm38 deficiency markedly decreases the tumor penetrance in mice heterozygous for p53 via enhanced p53 expression.  Rbm38
 56190  28850611  The hearts of Rbm38 -/- mice were mildly hypertrophic but cardiac function was not affected. Furthermore Rbm38 deficiency did not affect . . .  Rbm38
1 Like