how to compare two VCE files?

rdcwayx · June 16, 2010, 12:24am

Two VCE files (files to pass IT ceritifcate test) have been converted to text files. So now I need a script to compare the difference between them.

Each question session start from QUESTION and end by Section. Here is the sample.

QUESTION 7
A user logs into a system running the Solaris 8 Operating Environment using the telnet command. The user has been assigned a Korn shell (ksh)and home /home/user1 directory.
All of the following files exist with appropriate ownership and permissions. Which two files are always used
by the Korn shell to initialize this user's session? (Choose two.)

A. /etc/login
B. /etc/.login
C. /etc/profile
D. /home/user1/.login
E. /home/user1/.cshrc

Answer: CF
Section: (none) 

QUESTION 8
When using the command line to add, modify, or delete user accounts, or to add modify or delete groups, it is possible to use the -o option to allow duplicate users or group IDs.
Which four commands support the use of the -o syntax? (Choose four.)

A. userdel
B. adduser
C. useradd
D. usermod
E. groupmod
F. groupdel
G. groupadd

Answer: CDEG
Section: (none)

The questions in file A and file B are not in order, maybe Questions 7 is another file's Question 45.
The number of questions is also different.

I'd like to find out the different questions between File A and B. I am also working on it, if you have good idea, please paste here.

Corona688 · June 16, 2010, 12:58pm

Assuming identical questions are literally identical, I'd try splitting the files into individual questions, removing the question number, and taking the checksum of the rest of each question(i.e. echo "question text" | md5sum). You might keep files with with one question each, named their checksums, for later reference... Make a big list of all checksums from both papers, sort it. Any duplicate questions will show up as two of the same checksum in a row.

rdcwayx · June 16, 2010, 10:42pm

Thank you, I follow your idea to create the questions files by below commands:

awk '/^QUESTION/ {a=$2;$2=""} {print > "VCE1-" a ".txt"}' VCE1.txt
awk '/^QUESTION/ {a=$2;$2=""} {print > "VCE2-" a ".txt"}' VCE2.txt

then get the checksum number.

ls VCE1*.txt |xargs md5sum |sort -n > VCE1-md5sum
ls VCE2*.txt |xargs md5sum |sort -n > VCE2-md5sum

Here is the format of md5sum list.

$ cat VCE1-md5sum
00bd766865d17c3e243c6784d6b37668 *VCE1-181.txt
0c344ffe57b2422bf0440dd27999052e *VCE1-251.txt
0c9892860724147297f7ecc161221e2d *VCE1-172.txt
0d9a7baa0f12d23172530f692ea0d04c *VCE1-228.txt
0db692b1bf295bb09743768918e48822 *VCE1-147.txt
0e40994fdd199aed380acf4392d5f030 *VCE1-168.txt

Any one has better command to show the different between VCE1-md5sum and VCE2-md5sum

Here is my solution.

Show different questions:

awk '{a[$1]++} END {for (i in a) {if (a==1) print i}}' VCE*md5sum |xargs -i grep {}  VCE*md5sum

Show same questions:

awk '{a[$1]++} END {for (i in a) {if (a==2) print i}}' VCE*md5sum |xargs -i grep {}  VCE*md5sum