cmccabe
September 15, 2016, 11:08am
1
I am trying to extract text after keywords fron an html
file. The keywords are reportLink":
, "barcodedSamples": {"
, "barcodedSamples": {"
. Both the perl
and awk
run but the output is just the entire index.html
not the desired output. Also for the reportLink":
only the text after the second /
until the third /
is needed but I do not think I accounted for that. Thank you :).
index.html
"reportLink": "/output/Home/Auto_user_S5-00580-5-Medexome_65_030/"", "status": "Completed", "timeStamp": "2016-09-01T18:32:18.000371+00:00"}], {"meta": {"limit": 20, "next": null, "offset": 0, "previous": null, "total_count": 6}, "objects": [{"barcodeId": "IonXpress", "barcodedSamples": {"MEV45": {"barcodeSampleInfo": {"IonXpress_007": {"controlSequenceType": "", "barcodedSamples": {"MEV46": {"barcodeSampleInfo": {"IonXpress_008"
perl -ne 'print if /reportLink":/ /"barcodedSamples": {"/ /{"barcodeSampleInfo": {"/' index.html > out
awk -v RS='' '/reportLink":/ /"barcodedSamples": {"/ /{"barcodeSampleInfo": {"/' index.html > out
desired output
Auto_user_S5-00580-4-Medexome_65_30
IonXpress_007 MEV45
IonXpress_008 MEV46
We'll need to see the HTML, not just the bit you want.
1 Like
cmccabe
September 15, 2016, 12:59pm
3
I have attached the full file as it is quite large. Thank you :).
RudiC
September 15, 2016, 5:23pm
4
By no stretch of the imagination your awk
script will run flawlessly. If the "patterns" were connected with OR operators, and any of them would turn out TRUE, the actual line/record would be printed (the default selected by you). As your file is just ONE line/record, the entire file is printed.
1 Like
RudiC
September 15, 2016, 5:53pm
5
Try (as a starting point)
awk -F"[]\":{}, ]*" '
BEGIN {for (n=split ("reportLink,barcodedSamples,barcodeSampleInfo", T); n>0; n--) SRCH[T[n]] = n
}
{for (i=1; i<NF; i++) if ($i in SRCH) print $(i+1)
}
' /tmp/6784d1473958785-extract-text-html-using-perl-awk-index-html
MEV45
IonXpress_007
IonXpress_008
IonXpress_009
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/
/output/Home/Auto_user_S5-00580-5-Medexome_66_tn_031/
MEV42
IonXpress_004
IonXpress_005
IonXpress_006
/output/Home/Auto_user_S5-00580-4-Medexome_65_028/
/output/Home/Auto_user_S5-00580-4-Medexome_65_tn_029/
MEC1
IonXpress_001
IonXpress_002
IonXpress_003
/output/Home/medex60_8.13.16_027/
/output/Home/reanlzemedex60_023/
/output/Home/Auto_user_S5-00580-2-Medical_Exome_60_014/
/output/Home/Auto_user_S5-00580-2-Medical_Exome_60_tn_015/
MEC1
IonXpress_001
IonXpress_002
IonXpress_003
/output/Home/Medex59_8.11.2016_026/
/output/Home/MEDEX59_8.11-2016_025/
/output/Home/reanalyze59_8.10.16_024/
/output/Home/Auto_user_S5-00580-3-Medical_Exome_59_016/
chipDescription
/output/Home/Auto_user_S5-00580-1-IQOQ_RUN_Sample_2_51_012/
/output/Home/Auto_user_S5-00580-1-IQOQ_RUN_Sample_2_51_tn_013/
chipDescription
/output/Home/Auto_user_S5-00580-0-Test_Fragment_Run_49_010/
/output/Home/Auto_user_S5-00580-0-Test_Fragment_Run_49_tn_011/
1 Like
cmccabe
September 16, 2016, 11:22am
6
Thank you very much that gives me a good start :).