I have a csv file which contains some millions of lines in it.
The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line).
I don't want to use any pattern from the Header as I have some 100s of such files in which I want to compare the first line with the remaining lines and remove the duplicates from the second occurance.
I cant use 'uniq' as I have other duplicate lines(Non-Headers) which I still need them persist.
---------- Post updated 01-07-14 at 09:18 PM ---------- Previous update was 01-06-14 at 09:27 PM ----------
Hi bartus11,
The same command works fine from command line. But, I have script Run_queries.sh as below
/*
#!/bin/ksh
if [ "$4" == "FieldTickets" ]; then
export INTERNAL_CODE="Field Tickets";
echo $INTERNAL_CODE
else
export INTERNAL_CODE=$4;
fi
export DATE=`date +"%d%b%Y"`
cd $HOME/Prod_report_queries/reports/$DATE
sqlplus xyz/xxxx@xxx << EOF
set pagesize 50000
set heading on
set feedback off
set trimspool on
set trim on
set linesize 32767
set termout off
set verify off
set underline off
set colsep '","'
set headsep '","'
define date_from = '$2'
define date_to = '$3'
define company_id = '$1'
define internal_code='$INTERNAL_CODE'
define user_list = '$7'
SPOOL $5$DATE.csv
@$HOME/Prod_report_queries/queries/$6
SPOOL OFF
!sed '/^$/d' $5$DATE.csv > temp2.csv
!sed '/SQL>/d' temp2.csv > temp3.csv
!sed 's/^.*$/"&"/' temp3.csv > temp4.csv
!sed 's/[ ]*","/","/g' temp4.csv > temp5.csv
!sed 's/","[ ]*/","/g' temp5.csv > $5$DATE.csv
!awk 'NR==1{x=$0;print}$0!=x' temp6.csv > $5$DATE.csv.csv
!rm temp*.csv
EXIT;
This script takes some parameters from command line and gives me an formatted csv file. command below
sh Run_queries.sh 753807 NULL NULL NULL Hess-RedPOAlertReport Hess-RedPOAlertReport.sql NULL
But the 'awk' fails with the below error
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: bailing out near line 1
Is there a way to awk the csv file from within Run_queries.sh?