Hi, for my wiki I need 1 more complicated way to solve a problem.
I have 1400 pages of court cases that are linked throughout the site. The problem is that the title of the pages are
name v name (1988)
and the links should therefor be [[name v name (1988)]]
But, something went wrong and loads of links look like this
[[name vname (1988)]]
So I need a search and replace to do change it back to normal. The issue I can see is that if we simply replace all " v[a-z]" " v [a-z]" that it will also change "victor" into " v ictor"
So can someone please come up with a search and replace solution for this??
---------- Post updated 15-01-11 at 12:28 AM ---------- Previous update was 14-01-11 at 10:41 PM ----------
so I guess something like (sorry I cannot write the expressions)
if " v victor" dont do anything if " victor" change to " v ictor"
I am sure that you can create an SQL statement to do what you want but if you want to do it the way you describe, you can use sed to find and replace the errant string
Thank you so much!
Would you know how to do the mysql statement??
---------- Post updated at 12:55 AM ---------- Previous update was at 12:05 AM ----------
hi just tried it but it seems it did not have the desired effect,
here is a sample
Statutes remain law until they are repealed by Parliament. Sometimes this creates absurdities like [[Ashford vthornton 1818]] (right to trial by combat), and [[Prince ernest of hanover vattorney general 1957]], and Parliament normally moves to repeal such legislation rather quickly.
* The principle of ejusdem generis applies if the passage can be so read; that is, where a statute provides a list like `X, Y, and similar Zs', then it will only apply to an item not on the list if it really is of the class Z. For example, `cats, dogs, and other pets' probably excludes lions (as not being pets), but it is not clear whether `cats, dogs, and other animals' would include lions (see the specific entry for [[Ejusdem generis]] for more details).
* The principle of expressio unius est exclusio alterius (``the expression of one is the exclusion of others'') applies. So, for example, ``land and coalmines'' does not include slate mines, because if that meaning were intended that clause would have been ``land and mines''.
* Statutes are assumed not the alter the common law, unless this is stated explicitly. Of course, many statutes are created specifically to alter the common law, and the `mischief rule' relies for its effect on this fact.
* Statutes are assumed not to violate international law; however, if the wording is clear a court cannot disapply a statute even if it is in flagrant violation (e.g., [[R v environment agency ex parte marchiori 2002]]).
I did correct it right before your posting I belief, you are correct that is the way it should be (minor correction below). This was never an issue as only a hand full of people used my site, but lately we get so much traffic that I thought it was time to do some house cleaning.. but manual is just too much work! sorry for the incorrect posting
It should be exact
[[Ashford v thornton (1818)]]
---------- Post updated at 10:50 AM ---------- Previous update was at 10:35 AM ----------
One more example
is not a false imprisonment (see: [[Bird vjones 1845]]), although it might be actionable in [[Nuisance]]. It probably is not false imprisonment if a person accepts confinement voluntarily, with a contractual agreement as to his mode of release see: [[Robinson v balmain new ferry co 1910]].
should thus be
is not a false imprisonment (see: [[Bird v jones (1845)]]), although it might be actionable in [[Nuisance]]. It probably is not false imprisonment if a person accepts confinement voluntarily, with a contractual agreement as to his mode of release see: [[Robinson v balmain new ferry co (1910)]].
Here is a small crude C program which should do 98% of what you need to do if you have defined and described your problem correctly to us. The remaining 2% is where the letter 'v' is the first letter of a plaintiff or defendant.
Please try it out on a copy of your data and let me know how you get on.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
char linebuf[5000];
char newlinebuf[5000];
char patternbuf[500];
char *
change_citation()
{
static char newbuf[500];
static char newbuf1[500];
char foundv = 0;
char words = 0;
char *q = patternbuf;
char *r = patternbuf;
char *t = patternbuf;
char *n;
char w;
char fix = 0;
#if DEBUG
fprintf(stderr, "patternbuf: %s\n", patternbuf);
#endif
// count spaces and parse for ' v '. Set flag if found
q = patternbuf;
while (*q) {
if (*q == ' ') words++;
if (*q == ' ' && *(q+1) == 'v' && *(q+2) == ' ') {
foundv = 1;
}
q++;
}
#if DEBUG
fprintf(stderr, "words: %d V: %d\n", words, foundv);
#endif
// find end of citation
while (*t) t++;
// fix up brackets around citation year if necessary
n = newbuf;
q = patternbuf;
while (*q) {
if (q == t - 8 && *q != '(' ) { // need to fix
*n++ = *q++;
*n++ = *q++;
t = t - 6;
*n++ = '(';
*n++ = *t++;
*n++ = *t++;
*n++ = *t++;
*n++ = *t++;
*n++ = ')';
*n++ = *t++;
*n++ = *t;
break;
}
*n++ = *q++;
}
*n = '\0';
if (foundv)
return(newbuf);
// select a word and split on v
q = newbuf;
n = newbuf1;
w = 0;
while (*q) {
if (*q == ' ') w++;
*n++ = *q++;
if (fix) continue;
if ((words == 2 && w == 1 && *q == 'v') ||
(words == 3 && w > 1 && *q == 'v') ||
(words >= 4 && w > 1 && *q == 'v')) {
*n++ = *q++;
*n++ = ' ';
*n++ = toupper(*q++);
fix = 1;
}
}
*n = '\0';
return (newbuf1);
}
int
main(int argc,
char *argv[])
{
FILE *from, *to;
char ch;
char *lp = linebuf;
char *p;
char *pp;
char *nlp;
if ( argc !=3) {
printf("Usage: change <source> <destination>\n");
exit(1);
}
/* open source file */
if ((from = fopen(argv[1], "r")) == NULL) {
printf("Cannot open source file.\n");
exit(1);
}
/* open destination file */
if ((to = fopen(argv[2], "w")) == NULL) {
printf("Cannot open destination file.\n");
exit(1);
}
/* copy the file and fix up as necessary */
while (!feof(from)) {
ch = fgetc(from);
*lp++ = ch;
if (ch == '\n') {
*lp = '\0';
/* got a line */
lp = linebuf;
nlp = newlinebuf;
while (*lp) {
/* check for start of pattern */
if (*lp == '[' && *(lp+1) == '[' ) { // start of citation
pp = patternbuf;
while (*lp) {
if (*lp == ']' && *(lp+1) == ']') { // end of citation
*pp++ = *lp++;
*pp++ = *lp++;
*pp = '\0';
pp = change_citation(); // parse citation and change if necessary
while (*pp) // write citation out
*nlp++ = *pp++;
break;
}
*pp++ = *lp++;
}
}
*nlp++ = *lp++;
}
*nlp = '\0';
/* write out line */
nlp = newlinebuf;
while (*nlp) fputc(*nlp++, to);
lp = linebuf;
}
}
if (fclose(from) == EOF) {
printf("Error closing source file.\n");
exit(1);
}
if (fclose(to) == EOF) {
printf("Error closing destination file.\n");
exit(1);
}
exit(0);
}
On the following test file
#
false imprisonment (see: [[Bird v Jones (1845)]]).
false imprisonment (see: [[Bird v Jones (1845)]] and [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird et al v Jones et al 1845]] and arrest (see: [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird vjones 1845]]) and also arrest (see: [[Smith vjones 1865]]).
#
false imprisonment (see: [[Bird vjones (1845)]]) and arrest (see: [[Smith vjones (1865)]]).
false imprisonment (see: [[Bird Murphy vjones (1845)]]).
false imprisonment (see: [[Bird Murphy vjones smith (1845)]]).
false imprisonment (see: [[Bird et al vjones et al (1845)]]).
it outputs
#
false imprisonment (see: [[Bird v Jones (1845)]]).
false imprisonment (see: [[Bird v Jones (1845)]] and [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird et al v Jones et al (1845)]] and arrest (see: [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird v Jones (1845)]]) and also arrest (see: [[Smith v Jones (1865)]]).
#
false imprisonment (see: [[Bird v Jones (1845)]]) and arrest (see: [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird Murphy v Jones (1845)]]).
false imprisonment (see: [[Bird Murphy v Jones smith (1845)]]).
false imprisonment (see: [[Bird et al v Jones et al (1845)]]).
I copied the code I gave you from this site, complied it and ran it against the test file I also provided. It performed as expected and did not core dump.
You obviously did something different. Please tell us what you did differently and why you are getting different results such as a core dump.
Its 132mb in size, right now I am dumping it as an CSV file, and see if that makes a difference. If that does not help, could I PM you the file location of the dump?
---------- Post updated at 11:02 PM ---------- Previous update was at 10:39 PM ----------
Hi,
I tried it on the csv file and it gave me the following result,
Or back to the orginal approach. MediaWiki exports data using an XML format. MySQL using an SQL format. Let's assume the SQL format, with one insert-record per line (which may require a special mysql-dump option)
Humm, I believe the output from the Otheus Perl script fails to fix the problem. For example, consider the following test file:
false imprisonment (see: [[Bird v Jones (1845)]]).
false imprisonment (see: [[Bird v Jones (1845)]] and [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird et al v Jones et al 1845]] and arrest (see: [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird vjones 1845]]) and also arrest (see: [[Smith vjones 1865]]).
false imprisonment (see: [[Bird vjones (1845)]]) and arrest (see: [[Smith vjones (1865)]]).
false imprisonment (see: [[Bird Murphy vjones (1845)]]).
false imprisonment (see: [[Bird Murphy vjones smith (1845)]]).
false imprisonment (see: [[Bird et al vjones et al (1845)]]).
Here is the output generated by the Perl script:
false imprisonment (see: [[Bird v Jones (1845)]]).
false imprisonment (see: [[Bird v Jones (1845)]] and [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird et al v Jones et al 1845]] and arrest (see: [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird v jones (1845)]]) and also arrest (see: [[Smith v jones (1865)]]).
false imprisonment (see: [[Bird vjones (1845)]]) and arrest (see: [[Smith vjones (1865)]]).
false imprisonment (see: [[Bird Murphy vjones (1845)]]).
false imprisonment (see: [[Bird Murphy vjones smith (1845)]]).
false imprisonment (see: [[Bird et al vjones et al (1845)]]).
The problem with the " v " remains!
The problem domain is complicated by the fact that the number of words within the citation is variable. The next issue is to upper case the first letter after " v " is not already uppercase. Solve these and the rest is trivial.
I didn't pay close attention to the follow-up posts of this thread. Make it two-pass. Let's try this:
perl -pe 's/\[\[(\w.*?) v(\w)(.*?) (\d{4}|\(\d{4}\))\]\]/[[$1 v \U$2\E$3 $4]]/;s/(\[\[.*? )(\d{4})(\]\])/$1($2)$3/g' dump-working.sql
The first substitution handles the "v" problem, while the second handles the dates. (It could be done all in one, assuming that ALL such references were wrong, but if some are partially correct, the one-pass version fails on those)