Doing Checks on a file

SAMZ · August 12, 2008, 5:54am

I have a process that I am trying to provide a solution for and have hit a brick wall and would like some pointers in the right direction.
Basically on a daily basis a report is automatically generated in a CSV format (FIRST.CSV) which includes codes and amounts in the following format:

A016ZZ ,, 1400.43
A011ZZ ,, 15000.98
B014ZZ ,, -450.83
B027ZZ ,, 86.90
.....

The amounts can be negative or positive. The codes on the other hand always have to begin with A or B and have to always end with ZZ.
Anyway after this file has been scrutinised it usually needs some amendments, however to amend the file a second CSV file (SECOND.CSV) file is created manually in the same format as above. That is codes and the corresponding values.

These 2 files then need to be merged which is fine, but before it can be merged the second file (SECOND.CSV) needs to have some checks run on to ensure each code begins with an A or B followed by three numrical figures followed by the 'ZZ'post fix.

Could anyone assist as to how this check can be done. I have got as far as this:

if [ `cat ${SECOND.CSV} | egrep -v '^L|^M' | wc -l` -gt 0 ]
then
print "code other than A or B present."
exit 1
fi

buffoonix · August 12, 2008, 6:24am

Haven't tested it but I think it should be as easy as

if grep -qE '^[AB][0-9][0-9][0-9]' ${SECOND.CSV}; then
    # do something
fi

I am even convinced that the -E option in the grep is redundant.
For such trivial expression we don't need to fire the extended regex engine.
Btw, a variable name containing a dot isn't good naming I would say.

buffoonix · August 12, 2008, 6:28am

Oops, sorry got it wrong.
You require that each record contains this pattern.
Simply squeeze in the -v with the grep.

zaxxon · August 12, 2008, 6:53am

If you want to get rid of them, you can use

sed -n '/^[A|B]\{1\}[0-9]\{3\}ZZ /p' infile

So there will be no need left to check if, and then do some action

A little hint: If you use grep, don't cat a file into it. Just use grep <pattern> <filename>, as grep can take the file to be processed as parameter. Era will be proud of me! ;D (sorry, that was a kind of insider joke)

SAMZ · August 12, 2008, 7:16am

OK the above works a treat thanks. This leads me to mu next and final issue but will need to explain a little first.

Once the first file (FIRST.CSV) has been generated with codes and ammount column adjustments have to be made to certain accounts. This is done my the use of the second file (SECOND.CSV). If the codes in the first file and second file agree then the amount in the second file is added/substracted to the first file. If the code in the second file does not agree to any code in the first file then the code and amount are appended to bottom:

e.g
FIRST.CSV
A001ZZ ,, 400
A002ZZ ,, 300

SECOND.CSV
A001ZZ ,, -200
A002ZZ ,, 100
A003ZZ ,, 10

THIRD.CSV
A001ZZ ,, 200
A002ZZ ,, 400
A003ZZ ,, 10

The above isd done by using the code:
join -t, -a1 -a2 ${SECOND.CSV} ${FIRST.CSV} > ${THIRD.CSV}
nawk -F, '{ printf $1",," "%011.2f\n",$3+$5 }' ${THIRD.CSV} > ${FINAL.CSV}

The problem with the above is that if for example A001ZZ needs two adjustmentss at the same time meaning the SECOND.CSV file will look like:
SECOND.CSV
A001ZZ ,, -200
A001ZZ ,, 300
A002ZZ ,, 100
A003ZZ ,, 10

my code will only pick up the first instance of A001ZZ and adjust as required and the second occurence of A001ZZ will then be treated as a new code and be appended to the bottom of FINAL.CSV.

Could anyone please assist as to how I can resolve this issue?

era · August 12, 2008, 7:18am

zaxxon: It's not like I'm a higher authority on that particular issue, just a more vocal one, perhaps.

SAMZ · August 12, 2008, 10:17am

anyone with an idea????