extract strings from file and display in csv format

grajp002 · March 17, 2010, 10:40pm

Hello All,
I have a file whose data looks something like this

[id=4][name=alpha][state=ny][city=abc]
[id=2][name=beta][state=wv][city=pqr]
[id=3][name=theta][state=ca][city=xyz]
[id=5][name=gamma][state=tx][city=jkl]
[id=1][name=psi][state=ga][city=zzz]

I want to extract just the id, name and city fields in a csv format and sort them by id. Output should look like this.

1,psi,zzz
2,beta,pqr
3,theta,xyz
4,alpha,abc
5,gamma,jkl

I could extract the individual strings values using the sed command but could not arrange them in csv format. Could someone please help.

jim_mcnamara · March 17, 2010, 11:16pm

how about awk

awk '{gsub(/\[|\]| |=/, " "); print $2 "," $4 "," $8}'  oldfile > newfile.csv

murugaperumal · March 17, 2010, 11:23pm

 
sed -r 's/\[[a-z]+=([0-9]+)\]\[[a-z]+=([a-z]+)\].*\[[a-z]+=([a-z]+)\]/\1,\2,\3/' filename  | sort

dennis.jacob · March 17, 2010, 11:44pm

Assuming the number of fields are fixed,

perl -lane 'print  join ",", (split /=|]|\[/,$_)[2,5,8,11];' file

ungalnanban · March 17, 2010, 11:45pm

See the following Sed command.

sed -r 's/(.*)(=[a-z0-9]*)(.*)(=[a-z0-9]*)(.*)(=[a-z0-9]*)(.*)(=[a-z0-9]*)(.*)/\2,\4,\8/g;s/=//g' Input_file | sort

grajp002 · March 18, 2010, 1:10am

Thank you all for your prompt response.

What should I do, If my data is as follows.

[id=4][name=alpha][state=ny][city=abc]
[employee=234]
[id=2][name=beta][state=wv][city=pqr]
[employee=254]
[id=3][name=theta][state=ca][city=xyz]
[employee=432]
[id=5][name=gamma][state=tx][city=jkl]
[employee=239]
[id=1][name=psi][state=ga][city=zzz]
[employee=222]

and I want m data as follows

1,222,psi,zzz
2,254,beta,pqr
3,432theta,xyz
4,234,alpha,abc
5,239,gamma,jkl

karthigayan · March 18, 2010, 2:11am

Try this perl script ,

#!/usr/bin/perl

use strict;
use warnings;

open FH,"<file";
my $count=1;
my $id;
my $name;
my $state;
my $city;
my %hash=();
while(<FH>)
{
    if(($count%2)==0)
    {
        $_=~s/^.+=([0-9]+)]$/$1/g;
        chomp;
        push @{$hash{$id}},$_,$name,$city;
    }
    else
    {
        if(/\[.+=([0-9]+).+=(.+)].+=(.+)].+=(.+)]$/g)
        {
            $id=$1;
            $name=$2;
            $state=$3;
            $city=$4;
        }
    }
    $count++;
}

foreach (sort keys %hash)
{
    print $_ .",". $hash{$_}[0]. "," . $hash{$_}[1] . "," . $hash{$_}[2] ."\n";
}

Here 'file' contains the input .

murugaperumal · March 18, 2010, 2:18am

Try the following code

 
sed '1,$N;s/\n//g'  file | sed -r 's/\[[a-z]+=([0-9]+)\]\[[a-z]+=([a-z]+)\].*\[[a-z]+=([a-z]+)\]\[[a-z]+=([0-9]+)\]/\1,\4,\2,\3/' | sort

dennis.jacob · March 18, 2010, 2:38am

Try:

perl -lane '$s .= $_; END {foreach (split /\[id=/,$s) {print join "," ,(split /=|]|\[/,$_)[0,12,3,9]; } }' file | sort

malcomex999 · March 18, 2010, 2:39am

Modifying jim mcnamara script...

awk '{gsub(/\[|\]| |=/, " ");if($1=="id"){id=$2;name=$4;city=$8}else{print id,$2,name,city}}' OFS="," infile | sort

frans · March 18, 2010, 3:10am

bash

#!/bin/bash
IFS="]"
while true
do
    read ID NA ST CI && read EM || break
    echo "${ID#*=},${EM#*=},${NA#*=},${CI#*=}"
done < infile | sort

rdcwayx · March 19, 2010, 12:56am

need sort the result. :rolleyes:

awk '{gsub(/\[|\]| |=/, " "); print $2,$4,$8 |"sort -n"}' OFS=","  urfile

kshji · March 19, 2010, 1:38am

Remove [] => it's commandline = eval => use variables.
After employee line print out variables.

#!/bin/ksh or bash or dash or sh or ...
employee=0
cat infile | tr  "][" " " | while read line
do
        eval $line
        [ "$employee" = 0 ] && continue  # read next line
        echo "$id,$employee,$name,$city"
        employee=0
done | sort -t "," -k 1,1n