Split into columns based on the parameter and use & as delimiter

Here is my source, i have million lines like this on a file.

disp0201.php?poc=4060&roc=1&ps=R&ooc=13&mjv=6&mov=5&rel=5&bod=155&oxi=2&omj=5&ozn=1&dav=20&cd=&daz=& drc=&mo=&sid=&lang=EN&loc=JPN

I want to split this into columns in order to load in database, anything starts with"&mjv=6" as first column &rel=5 as second column,&oxi=2 as third column like wise...for entire param list. But all values in parameters may change. So I guess i can use wildcard, but i need to use "&" as delimiter.

Please help me on this
EM

Post desired output for that line.

My Source fields are aligned improperly back & forth

poc=4060&mjv=6&mov=5&roc=1&ps=R&ooc=13&rel=5&bod=155&oxi=2&omj=5&ozn=1&dav=20&cd=&daz=&drc=&mo=&sid=&lang=ja&loc=JPN
poc=42160&mjv=7&mov=5&roc=1&ps=B&rel=5&bod=155&ooc=19&oxi=5&omj=5&ozn=1&dav=20&cd=&daz=&drc=&loc=USA&mo=&sid=&lang=EN
poc=4040&mjv=6&mov=7&roc=1&ps=R&ooc=73&rel=5&oxi=2&omj=5&ozn=3&dav=20&bod=126&cd=&daz=&drc=&mo=&sid=&loc=GBR&lang=EN

In order to do some analysis over the data I need to organize and load in database. My desired output should be like this

poc=4060	&ooc=13	&mjv=6	&omj=5	&mov=5	&ps=R	&rel=5	&bod=155	&oxi=2	&ozn=1	lang=ja	&loc=JPN
poc=42160	&ooc=19	&mjv=7	&omj=5	&mov=5	&ps=B	&rel=5	&bod=155	&oxi=5	&ozn=1	lang=EN	&loc=USA	
poc=4040	&ooc=73	&mjv=6	&omj=5	&mov=7	&ps=R	&rel=5	&bod=126	&oxi=2	&ozn=3	lang=EN	&loc=GBR

#!/bin/bash
while IFS='&' read -a L
do
for i in ${!L[@]}; do
eval "${L[$i]}"
done
for V in poc ooc mjv omj mov ps rel bod oxi ozn lang loc
do
echo -en "$V=${!V}\t"
done
echo
done <infile

$ cat infile
poc=4060&mjv=6&mov=5&roc=1&ps=R&ooc=13&rel=5&bod=155&oxi=2&omj=5&ozn=1&dav=20&cd=&daz=&drc=&mo=&sid=&lang=ja&loc=JPN
poc=42160&mjv=7&mov=5&roc=1&ps=B&rel=5&bod=155&ooc=19&oxi=5&omj=5&ozn=1&dav=20&cd=&daz=&drc=&loc=USA&mo=&sid=&lang=EN
poc=4040&mjv=6&mov=7&roc=1&ps=R&ooc=73&rel=5&oxi=2&omj=5&ozn=3&dav=20&bod=126&cd=&daz=&drc=&mo=&sid=&loc=GBR&lang=EN

output

poc=4060	ooc=13	mjv=6	omj=5	mov=5	ps=R	rel=5	bod=155	oxi=2	ozn=1	lang=ja	loc=JPN
poc=42160	ooc=19	mjv=7	omj=5	mov=5	ps=B	rel=5	bod=155	oxi=5	ozn=1	lang=EN	loc=USA
poc=4040	ooc=73	mjv=6	omj=5	mov=7	ps=R	rel=5	bod=126	oxi=2	ozn=3	lang=EN	loc=GBR
1 Like

Wow...Simply amazing...If you could explain the syntax that would be great!

while IFS='&' read -a L
# set the field separator to '&' and read the line as an array
# each elemnt of te array corresponds to an axpression between '&'
# L[0]="poc=4060", L[1]="mjv=6", L[2]="mov=5" and so on
# The line is plitted into an array with '&' as separator.
do
for i in ${!L[@]}; do # for each index in the array
eval "${L[$i]}" # "evaluate", which means execute the command in the string
done
for V in poc ooc mjv omj mov ps rel bod oxi ozn lang loc
do
echo -en "$V=${!V}\t" # ${!<variable>} is indirect reference
done
echo
done <infile

for i in ${!L[@]}; do
   eval "${L[$i]}"string
done

could also be done like

for E in ${L[@]}; do   # for each element of the array
   eval "$E"
done

as we don't need the indexes of the array

--removed--

Thanks Frans for your explanation...Its good day for me to learn such kind of scripting. Thanks a lot.

awk -F\& 'BEGIN{m=split("poc&ooc&mjv&omj&mov&ps&rel&bod&oxi&ozn&lang&loc",T)}
         {s=x;for(i=1;i<=m;i++)for(j=1;j<=NF;j++)if($j~T)s=s (s==x?x:OFS FS) $j;print s}' OFS='\t' infile
awk -F\& '{r=x;for(i=split("poc&ooc&mjv&omj&mov&ps&rel&bod&oxi&ozn&lang&loc",F);i;i--)
for(j=0;j<NF;)if($++j~F)r=$j (r?"\t"FS r:x);print r}' infile

The use of eval to execute input data is dangerous stuff because it allows a malicious user with access to the data file to execute any code as the user who executes the script.

Alternative shell script:

# oldIFS=$IFS
IFS='&'
while read line
do
  for f in poc ooc mjv omj mov ps rel bod oxi ozn lang loc
  do
    for i in $line
    do
      case $i in
        $f=*) printf "%s\t" "$i"
      esac
    done
  done
  echo
done <infile
# IFS=$oldIFS
2 Likes

When going with the above script if any of the column/search string is not available (on f) I need to print NULL for the variable.

say for example if i don't find 'rel' & 'bod' in the line while reading the line, I want print rel=NULL and bod=NULL

I am trying out with lot of combination ..not able to achieve it.

Could this help ?
#!/bin/bash
while read L
do
for V in poc ooc mjv omj mov ps rel bod oxi ozn lang loc drc
do
Val=$(echo "$L" | grep -oE "$V=[^&]" | sed 's/.=//')
echo -en "$V=${Val:-NULL}\t"
done
echo
done <$INFILE

# oldIFS=$IFS
IFS='&'
while read line
do
  for f in poc ooc mjv omj mov ps rel bod oxi ozn lang loc
  do
    val=
    for i in $line
    do
      case $i in
        $f=*) val=$i; break
      esac
    done
    printf "%s\t" "${val:-$f=NULL}"
  done
  echo
done <infile
# IFS=$oldIFS