Parsing Bulk Data

jojo123 · March 9, 2014, 1:43am

Hi All,

Actullay I am looking for a smart way to parse files in a directory whose count is around 2000000 in a single day.

Find is working with me but taking a lot of times , sometimes even a day which is not helping me.:wall:

So anyone can help me know a smart way to get my desired data from a 2000000 files.

Scrutinizer · March 9, 2014, 3:26am

Can you post the find command that you are using?

jojo123 · March 9, 2014, 5:40am

Sorry for delay response

Command

find . | xargs egrep "TargetSystem" | nawk -F ":" '{print $1"|"$5}' | cut -c '25-27,41-' | sort | uniq -c | sort -u

Scrutinizer · March 9, 2014, 4:26pm

Perhaps that could be speeded up a little. What is the format of those files? Could you provide a sample?

jojo123 · March 16, 2014, 3:51am

they are in XML format

Scrutinizer · March 16, 2014, 4:39am

Could you provide a short sample (anonymized if need be)