Need to prepare a CSV table with inputs from multiple files

Hi,

We have thousands of Video files and we need to prepare a table with few of it's parameters/values from their Mediainfo output. Here I have 3 sample mediainfo output text files named Vid1, Vid2 & Vid3 so on... (such we have thousands) and each file has exactly 3 lines.

$ ls
Required_para.txt  Vid1  Vid2  Vid3

$ wc -l V*
   3 Vid1
   3 Vid2
   3 Vid3
   9 total
$ cat Vid1
Video : HEVC / Main 10@L4@Main / 5 750 kb/s / 1 920 pixels / 1 080 pixels 
Writing library : x265
Encoding settings                        : cpuid=1111039 / frame-threads=5 / numa-pools=32,0 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=1920x816 / interlace=0 / total-frames=193561 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=5 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=23 / keyint=250 / gop-lookahead=0 / bframes=8 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=100 / lookahead-slices=4 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=32 / min-cu-size=8 / no-rect / no-amp / max-tu-size=16 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=2 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / no-strong-intra-smoothing / max-merge=4 / limit-refs=3 / limit-modes / me=3 / subme=7 / merange=92 / temporal-mvp / weightp / weightb / no-analyze-src-pics / deblock=0:0 / no-sao / no-sao-non-deblock / rd=4 / no-early-skip / rskip / no-fast-intra / no-tskip-fast / no-cu-lossless / b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=1.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=23.3 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=3 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=0 / overscan=0 / videoformat=5 / range=0 / colorprim=1 / transfer=1 / colormatrix=1 / chromaloc=0 / display-window=0cll=0,0 / min-luma=0 / max-luma=1023 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=8 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / no-field / qp-adaptation-range=1.00

$ cat Vid2
Video : HEVC / Main 10@L4@Main / 3 711 kb/s / 1 920 pixels / 804 pixels 
Writing library : x265
Encoding settings : cpuid=1111039 / frame-threads=3 / numa-pools=8 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=1920x804 / interlace=0 / total-frames=155305 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=4 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=23 / keyint=250 / gop-lookahead=0 / bframes=8 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=25 / lookahead-slices=4 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=2 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=3 / limit-refs=3 / limit-modes / me=3 / subme=3 / merange=57 / temporal-mvp / no-frame-dup / no-hme / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / no-sao / no-sao-non-deblock / rd=4 / selective-sao=0 / no-early-skip / rskip / no-fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=1.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=18.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=3 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=0 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / cll=0,0 / min-luma=0 / max-luma=1023 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=1 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / no-field / qp-adaptation-range=1.00

$ cat Vid3
Video : HEVC / Main 10@L4@Main / 7 903 kb/s / 1 920 pixels / 800 pixels 
Writing library : x265
Encoding settings : cpuid=1111039 / frame-threads=3 / numa-pools=8 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=1920x800 / interlace=0 / total-frames=118448 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=4 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=23 / keyint=250 / gop-lookahead=0 / bframes=8 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=25 / lookahead-slices=4 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=2 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=3 / limit-refs=3 / limit-modes / me=3 / subme=3 / merange=57 / temporal-mvp / no-frame-dup / no-hme / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / no-sao / no-sao-non-deblock / rd=4 / selective-sao=0 / no-early-skip / rskip / no-fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=1.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=18.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=3 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=0 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / cll=0,0 / min-luma=0 / max-luma=1023 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=1 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / no-field / qp-adaptation-range=1.00

The file Required_para.txt has list of parameters (and their values) to be extracted into the output file. We might change this file to add/remove parameters.

$ cat Required_para.txt
input-res
crf
aq-mode
strong-intra-smoothing
bframes
rc-lookahead
me
subme
merange
deblock
selective-sao
psy-rd
psy-rdoq
ctu
rdoq-level
max-merge
ref
max-tu-size
transfer
colormatrix
colorprim

The output's first line is the list of parameters/values required (comma separated). Each Vid's medianfo will fit in each row with their parameter's values. The 3rd column BitRate is the value of first line's "kb/s" entry of VidX file. 7th column strong-intra-smoothing value should be either 0 (if input has no-strong-intra-smoothing ) OR 1 (if input has strong-intra-smoothing ).

Output file should be like below. I filled-in only few columns.

$ cat Output_Sample
Sl#,VideoName,BitRate,input-res,crf,aq-mode,strong-intra-smoothing,bframes,rc-lookahead,me,subme,merange,deblock,selective-sao,psy-rd,psy-rdoq,ctu,rdoq-level,max-merge,ref,max-tu-size,transfer,colormatrix,colorprim
1,Vid1,5750,1920x816,23.3,3,0,8
2,Vid2,3711,1920x804,18.0,3,1,8
3,Vid3,7903,1920x804,18.0,3,1,8

$

Thank you!

Hi prvnrk,

With 252 posts here you should know that UNIX.com is not a free script writing service.

You need to post your code what you wrote to accomplish your objective(s), the sample input and the required output, and any error messages.

You seemed to have done all of the above, EXCEPT you did not provide any code you wrote yourself to solve this problem.

Please, show your work and do your own work and ask for help. Do not ask others to do your work for you.

Your desired output doesn't seem to match your requirements / parameters / heading?

@Neo - Thanks, Always I tried and shared my attempts here (if you look at my previous posts) but this time also I tried and gave it up after 80% completion. Now I completed my script which I don't like (as always) as it's messy. This does my job but not perfect.

@RudiC - Thanks I missed BitRate field. now I fixed.

Please see my Script and other files, Appreciate a Better version. Thannks!!

$ cat Final_Format.sh
#!/usr/bin/bash
cd /home/admin/HEVC1
echo "Sl#,VideoName,BitRate,`cat Required_para.txt | paste -sd "," -`" >/home/admin/HEVC1/MedInfo_Output.csv
countt=0
for VidX in Vid*
do
        countt=`expr $countt + 1`
        echo "${countt}" >/tmp/Req_Values_Vertical.txt
        echo "${VidX}" >>/tmp/Req_Values_Vertical.txt
        VidBitRate=`grep "HEVC " ${VidX} |grep Video |awk -F"/" '{print $3}' |sed 's/[^0-9]*//g'`
        echo "${VidBitRate}" >>/tmp/Req_Values_Vertical.txt

        grep "Encoding settings" ${VidX} |sed 's/\//\n/g' >/tmp/All_Values.txt
        grep "Encoding settings" ${VidX} |sed 's/\//\n/g' |grep = >/tmp/All_Values2.txt

                for param in `cat Required_para.txt` ; do
                grep "${param}"= /tmp/All_Values2.txt|head -1 >>/tmp/Req_Values_Vertical.txt
                done

        sed -i.bak 's/ //g' /tmp/Req_Values_Vertical.txt

        if [ `grep no-strong-intra-smoothing ${VidX} |wc -l` -gt 0 ]; then
        sed -i.bakk '/^aq-mode=.*/a strong-intra-smoothing=0' /tmp/Req_Values_Vertical.txt
        else
        sed -i.bakk '/^aq-mode=.*/a strong-intra-smoothing=1' /tmp/Req_Values_Vertical.txt
        fi
		
		if [ `grep selective-sao ${VidX} |wc -l` -eq 0 ]; then
        sed -i.bakkk '/^colorprim=.*/a selective-sao=NA' /tmp/Req_Values_Vertical.txt
        fi

        awk -F"=" '{print $NF}' /tmp/Req_Values_Vertical.txt >/tmp/Req_ValuesOnly_Vertical.txt
        cat /tmp/Req_ValuesOnly_Vertical.txt | paste -sd "," - >/tmp/Req_ValuesOnly_Horizontal.txt
        cat /tmp/Req_ValuesOnly_Horizontal.txt >>/home/admin/HEVC1/MedInfo_Output.csv
done
$ pwd
/home/admin/HEVC1
$ ./Final_Format.sh
$ cat MedInfo_Output.csv
Sl#,VideoName,BitRate,input-res,crf,aq-mode,strong-intra-smoothing,bframes,rc-lookahead,me,subme,merange,deblock,psy-rd,psy-rdoq,ctu,rdoq-level,max-merge,ref,max-tu-size,transfer,colormatrix,colorprim,selective-sao
1,Vid1,5750,1920x816,23.3,3,0,8,100,3,7,92,0:0,2.00,1.00,32,2,4,5,16,1,1,1,NA
2,Vid2,3711,1920x804,18.0,3,1,8,25,3,3,57,0:0,2.00,1.00,64,2,3,4,32,2,2,2,0
3,Vid3,7903,1920x800,18.0,3,1,8,25,3,3,57,0:0,2.00,1.00,64,2,3,4,32,2,2,2,0
$ 

$

$ cat Required_para.txt
input-res
crf
aq-mode
strong-intra-smoothing
bframes
rc-lookahead
me
subme
merange
deblock
psy-rd
psy-rdoq
ctu
rdoq-level
max-merge
ref
max-tu-size
transfer
colormatrix
colorprim
selective-sao

Try

awk -F/ '
NR == 1         {printf "Sl#,VideoName,BitRate"
                }
FNR == NR       {PRM[NR] =  "( |no-)" $0 "=*[^ /]*"
                 MX = NR
                 printf ",%s", $0
                 next
                }
!HDFIN          {printf RS
                 HDFIN = 1
                }
FNR == 1        {gsub (" ", "", $3)
                 printf "%d,%s,%d", ++FCNT, FILENAME, $3+0
                }
FNR == 3        {for (i=1; i<=MX; i++)  {match ($0, PRM)
                                         if (2 == split (substr ($0, RSTART, RLENGTH), TMP, "=")) printf ",%s", TMP[2]
                                           else printf ",%d", (! (TMP[1] ~ /^no/))
                                        }
                 printf RS
                }

' Required_para.txt OFS=, vid?
Sl#,VideoName,BitRate,input-res,crf,aq-mode,strong-intra-smoothing,bframes,rc-lookahead,me,subme,merange,deblock,selective-sao,psy-rd,psy-rdoq,ctu,rdoq-level,max-merge,ref,max-tu-size,transfer,colormatrix,colorprim
1,vid1,5750,1920x816,23.3,3,0,8,100,3,7,92,0:0,1,2.00,1.00,32,2,4,5,16,1,1,1
2,vid2,3711,1920x804,18.0,3,1,8,25,3,3,57,0:0,0,2.00,1.00,64,2,3,4,32,2,2,2
3,vid3,7903,1920x800,18.0,3,1,8,25,3,3,57,0:0,0,2.00,1.00,64,2,3,4,32,2,2,2
1 Like

@RudiC - Many thanks!! you are a god of awk :slight_smile:

The only issue with your script is that if MediaInfo of a video doesn't have entry of selective-sao , it displays 1 value for it. Could we set it to NA please?

$ grep selective-sao Vid1

$

Needless to say that your script runs million times faster than mine because of the efficient use of awk . I tried several times to learn awk in depth but never found it easy. Please suggest some material to learn it easy way.

Thanks again!

NA was not specified. Try

awk -F/ '
NR == 1         {printf "Sl#,VideoName,BitRate"
                }
FNR == NR       {PRM[NR] =  "( |no-)" $0 "=*[^ /]*"
                 MX = NR
                 printf ",%s", $0
                 next
                }
!HDFIN          {printf RS
                 HDFIN = 1
                }
FNR == 1        {gsub (" ", "", $3)
                 printf "%d,%s,%d", ++FCNT, FILENAME, $3+0
                }
FNR == 3        {for (i=1; i<=MX; i++)  {match ($0, PRM)
                                         n = split (substr ($0, RSTART, RLENGTH), TMP, "=")
                                         if (n == 2)            printf ",%s", TMP[2]
                                           else if (n == 1)     printf ",%d", (! (TMP[1] ~ /^no/))
                                                  else          printf ",NA"
                                        }
                 printf RS
                }

' Required_para.txt OFS=, vid?
Sl#,VideoName,BitRate,input-res,crf,aq-mode,strong-intra-smoothing,bframes,rc-lookahead,me,subme,merange,deblock,selective-sao,psy-rd,psy-rdoq,ctu,rdoq-level,max-merge,ref,max-tu-size,transfer,colormatrix,colorprim,selective-sao
1,vid1,5750,1920x816,23.3,3,0,8,100,3,7,92,0:0,NA,2.00,1.00,32,2,4,5,16,1,1,1,NA
2,vid2,3711,1920x804,18.0,3,1,8,25,3,3,57,0:0,0,2.00,1.00,64,2,3,4,32,2,2,2,0
3,vid3,7903,1920x800,18.0,3,1,8,25,3,3,57,0:0,0,2.00,1.00,64,2,3,4,32,2,2,2,0

No easy way. Read and exercise. What I did to learn awk is solve problems e.g. found in these forums and analyse / understand solutions given, and use the man page. I have to admit I have a history in other programming languages so wasn't a true novice.

1 Like