Converting unstructured data to structured data

Hi,

Can someone help in converting the below unstructured data to a CSV format please.

{
  "branchId" : "BNSFGDJNSJG-73264HB-132131BNHJFSDG",
  "branchName" : "NEWYORK-SSDF",
  "branchProductId" : "72Y5HFHSF7H3RUNAWEF",
  "PreferenceId" : "BASDBVcbzcYHcb",
  "emailId" : "hjbasf7hh4aBSDFU@mymail.com",
  "firstName" : "FNM",
  "lastName" : "LNM",
  "middleName" : "",
  "userId" : "LSDFJDSJMDFG238487ybsgfd"
}

Output required:

branchId,branchName,branchProductId,PreferenceId,emailId,firstName,lastName,middleName,userId
"BNSFGDJNSJG-73264HB-132131BNHJFSDG","NEWYORK-SSDF","72Y5HFHSF7H3RUNAWEF","BASDBVcbzcYHcb", "hjbasf7hh4aBSDFU@mymail.com", "FNM","LNM","","LSDFJDSJMDFG238487ybsgfd"

Thanks in advance
Naveen

Would this work?

perl -ne '@c=/(\w+)"\s+:\s+("[^"]+")/ and push @h, $c[0] and push @r, $c[1]; END{$"=","; print "@h\n@r\n"}' naveen.file

Output:

branchId,branchName,branchProductId,PreferenceId,emailId,firstName,lastName,userId
"BNSFGDJNSJG-73264HB-132131BNHJFSDG","NEWYORK-SSDF","72Y5HFHSF7H3RUNAWEF","BASDBVcbzcYHcb","hjbasf7hh4aBSDFU@mymail.com","FNM","LNM","LSDFJDSJMDFG238487ybsgfd"
1 Like

Works like a treat, thanks Aia :slight_smile: but why is the "middleName" field is missed ? is that because it doesn't contain any value.

#!/bin/bash

header=(); values=()

while IFS=":" read var val
do
  if [[ "$var" =~ "\"" ]]; then

  var="${var//\"/}"; var="${var% }"
   [[ "${#header[@]}" -eq "0" ]] && header=("$var") || header=("${header[@]}","$var")

  val="${val# }"; val="${val%,}"
   [[ "${#values[@]}" -eq "0" ]] && values=("$val") || values=("${values[@]}","$val")

  fi

done <inputfile

printf "%s\n%s\n" "${header[@]}" "${values[@]}"
1 Like

Thanks junior-helper. It works :slight_smile:

It is because I did not pay enough attention. Apologies.
Please, try the following:

perl -ne '@c=/(\w+)"\s+:\s+("[^"]*")/ and push @h, $c[0] and push @r, $c[1]; END{$"=","; print "@h\n@r\n"}' naveen.file

Output:

branchId,branchName,branchProductId,PreferenceId,emailId,firstName,lastName,middleName,userId
"BNSFGDJNSJG-73264HB-132131BNHJFSDG","NEWYORK-SSDF","72Y5HFHSF7H3RUNAWEF","BASDBVcbzcYHcb","hjbasf7hh4aBSDFU@mymail.com","FNM","LNM","","LSDFJDSJMDFG238487ybsgfd"
1 Like

Thanks Aia. Also, if we have two or more sets of data instead of one as mentioned below

{
  "branchId" : "BNSFGDJNSJG-73264HB-132131BNHJFSDG",
  "branchName" : "NEWYORK-SSDF",
  "branchProductId" : "72Y5HFHSF7H3RUNAWEF",
  "PreferenceId" : "BASDBVcbzcYHcb",
  "emailId" : "hjbasf7hh4aBSDFU@mymail.com",
  "firstName" : "FNM",
  "lastName" : "LNM",
  "middleName" : "",
  "userId" : "LSDFJDSJMDFG238487ybsgfd"
}

{
  "branchId" : "BNSFGDJNSJG-73264HB-132131BNHJFSDG",
  "branchName" : "NEWYORK-SSDF",
  "branchProductId" : "72Y5HFHSF7H3RUNAWEF",
  "userId" : "LSDFJDSJMDFG238487ybsgfd"
}

And what changes are required to get the output as below

branchId,branchName,branchProductId,PreferenceId,emailId,firstName,lastName,middleName,userId
"BNSFGDJNSJG-73264HB-132131BNHJFSDG","NEWYORK-SSDF","72Y5HFHSF7H3RUNAWEF","BASDBVcbzcYHcb", "hjbasf7hh4aBSDFU@mymail.com", "FNM","LNM","","LSDFJDSJMDFG238487ybsgfd"

branchId,branchName,branchProductId,userId
"BNSFGDJNSJG-73264HB-132131BNHJFSDG","NEWYORK-SSDF","72Y5HFHSF7H3RUNAWEF","LSDFJDSJMDFG238487ybsgfd"

In the meantime, you might want to try this... I'd like to hear from you if it works in general, even if you prefer the perl thingy :slight_smile:

#!/bin/bash

header=(); values=(); start=0

while IFS=":" read var val
do

 if [[ "$var" = "{" && "$start" = "0" ]]; then # "new" set detected
  start=1
  continue

 elif [[ "$var" = "" ]]; then
  echo
  continue

 elif [[ "$var" = "}" ]]; then # end of a set
  printf "%s\n%s\n" "${header[@]}" "${values[@]}"
  start=0; header=(); values=()
  continue

 else # process stuff within a set

  var="${var//\"/}"; var="$(echo $var)"
   [[ "${#header[@]}" -eq "0" ]] && header=("$var") || header=("${header[@]}","$var")

  val="${val# }"; val="${val%,}"
   [[ "${#values[@]}" -eq "0" ]] && values=("$val") || values=("${values[@]}","$val")

 fi

done <inputfile

As long as the file looks like you have posted, this might do it:

perl -nle 'BEGIN{$/="\n\n"; $"=","} @h=@r=(); while(/(\w+)"\s+:\s+("[^"]*")/g){push @h,$1; push @r,$2} print "@h\n@r\n"' naveen.file

Output:

branchId,branchName,branchProductId,PreferenceId,emailId,firstName,lastName,middleName,userId
"BNSFGDJNSJG-73264HB-132131BNHJFSDG","NEWYORK-SSDF","72Y5HFHSF7H3RUNAWEF","BASDBVcbzcYHcb","hjbasf7hh4aBSDFU@mymail.com","FNM","LNM","","LSDFJDSJMDFG238487ybsgfd"

branchId,branchName,branchProductId,userId
"BNSFGDJNSJG-73264HB-132131BNHJFSDG","NEWYORK-SSDF","72Y5HFHSF7H3RUNAWEF","LSDFJDSJMDFG238487ybsgfd"

A little bit playin with ruby....

#!/usr/bin/env ruby 

# call it with: ./thisfile.rb data.txt

word  = '[^"]*'
block = '{[^}]+}'

File.open(ARGV[0]).read.scan(/#{block}/m).map{ |current_block|
        puts current_block.scan(/^\s*"(#{word})"/m).map{|key|key[0]}.join(",") + "\n" +
             current_block.scan(/^\s*"#{word}" : ("#{word}")/m).map{|val|val[0]}.join(",")  
        }