Replacing dates]] with (dates)]]

Hi guys,

For my wiki site I need to fix 1400 pages that use the wrong date format, most pages (not all) use eg. 1988]] I need to change that to (1988)]]

The date range goes back to 1400 so I guess I need to do the following

ssh into my server,
dump mysql database
vi .sql dump
search and replace string
restore new .sql
voila

My request here is, can someone place write the search and replace string for this

Hi, for my wiki I need 1 more complicated way to solve a problem.

I have 1400 pages of court cases that are linked throughout the site. The problem is that the title of the pages are

name v name (1988)
and the links should therefor be [[name v name (1988)]]

But, something went wrong and loads of links look like this

[[name vname (1988)]]

So I need a search and replace to do change it back to normal. The issue I can see is that if we simply replace all " v[a-z]" " v [a-z]" that it will also change "victor" into " v ictor"

So can someone please come up with a search and replace solution for this??

---------- Post updated 15-01-11 at 12:28 AM ---------- Previous update was 14-01-11 at 10:41 PM ----------

so I guess something like (sorry I cannot write the expressions)

if " v victor" dont do anything if " victor" change to " v ictor"

I am sure that you can create an SQL statement to do what you want but if you want to do it the way you describe, you can use sed to find and replace the errant string

sed 's/\([0-9]\{4\}\)]]/(\1)]]/g'
1 Like

Thank you so much!
Would you know how to do the mysql statement??

---------- Post updated at 12:55 AM ---------- Previous update was at 12:05 AM ----------

hi just tried it but it seems it did not have the desired effect,
here is a sample

Statutes remain law until they are repealed by Parliament. Sometimes this creates absurdities like [[Ashford vthornton 1818]] (right to trial by combat), and [[Prince ernest of hanover vattorney general 1957]], and Parliament normally moves to repeal such legislation rather quickly.
* The principle of ejusdem generis applies if the passage can be so read; that is, where a statute provides a list like `X, Y, and similar Zs', then it will only apply to an item not on the list if it really is of the class Z. For example, `cats, dogs, and other pets' probably excludes lions (as not being pets), but it is not clear whether `cats, dogs, and other animals' would include lions (see the specific entry for [[Ejusdem generis]] for more details).
* The principle of expressio unius est exclusio alterius (``the expression of one is the exclusion of others'') applies. So, for example, ``land and coalmines'' does not include slate mines, because if that meaning were intended that clause would have been ``land and mines''.
* Statutes are assumed not the alter the common law, unless this is stated explicitly. Of course, many statutes are created specifically to alter the common law, and the `mischief rule' relies for its effect on this fact.
* Statutes are assumed not to violate international law; however, if the wording is clear a court cannot disapply a statute even if it is in flagrant violation (e.g., [[R v environment agency ex parte marchiori 2002]]).

Humm, appears to work fine for me as far as the problem description presented by you to this forum.

Please point out by example what you believe should be the correct output

I merged your second post on this topic to this post as they relate to the same issue.

Please take to time to study your problem and provide us with a clear concise description of your problems.

Hi,

Instead of

[[Ashford vthornton 1818]] 

it should be

[[Ashford vthornton (1818)]] 

or combined with my other post it should really be

[[Ashford v thornton (1818)]] 

Surely it should be:

[[Ashford v Thornton (1818)]]

You need to be precise if you wish to receive precise help from us.

I am so sorry,

I did correct it right before your posting I belief, you are correct that is the way it should be (minor correction below). This was never an issue as only a hand full of people used my site, but lately we get so much traffic that I thought it was time to do some house cleaning.. but manual is just too much work! sorry for the incorrect posting

It should be exact

[[Ashford v thornton (1818)]]

---------- Post updated at 10:50 AM ---------- Previous update was at 10:35 AM ----------

One more example

is not a false imprisonment (see: [[Bird vjones 1845]]), although it might be actionable in [[Nuisance]]. It probably is not false imprisonment if a person accepts confinement voluntarily, with a contractual agreement as to his mode of release see: [[Robinson v balmain new ferry co 1910]].

should thus be

is not a false imprisonment (see: [[Bird v jones (1845)]]), although it might be actionable in [[Nuisance]]. It probably is not false imprisonment if a person accepts confinement voluntarily, with a contractual agreement as to his mode of release see: [[Robinson v balmain new ferry co (1910)]].

Here is a small crude C program which should do 98% of what you need to do if you have defined and described your problem correctly to us. The remaining 2% is where the letter 'v' is the first letter of a plaintiff or defendant.

Please compile it and use it as follows:

gcc -o change change.c
chmod 755 change
./change infile outfile

Please try it out on a copy of your data and let me know how you get on.

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

char linebuf[5000];
char newlinebuf[5000];
char patternbuf[500];


char *
change_citation()
{
   static char newbuf[500];
   static char newbuf1[500];

   char foundv = 0;
   char words = 0;
   char *q = patternbuf;
   char *r = patternbuf;
   char *t = patternbuf;
   char *n;
   char w;
   char fix = 0;

#if DEBUG
   fprintf(stderr, "patternbuf: %s\n", patternbuf);
#endif

   // count spaces and parse for ' v '. Set flag if found
   q = patternbuf;
   while (*q) {
      if (*q == ' ') words++;
      if (*q == ' ' && *(q+1) == 'v' && *(q+2) == ' ') {
         foundv = 1;
      }
      q++;
   }

#if DEBUG
   fprintf(stderr, "words: %d V: %d\n", words, foundv);
#endif

   // find end of citation
   while (*t) t++;
   // fix up brackets around citation year if necessary
   n = newbuf;
   q = patternbuf;
   while (*q) {
      if (q == t - 8 && *q != '(' ) {   // need to fix
         *n++ = *q++;
         *n++ = *q++;
         t = t - 6;
         *n++ = '(';
         *n++ = *t++;
         *n++ = *t++;
         *n++ = *t++;
         *n++ = *t++;
         *n++ = ')';
         *n++ = *t++;
         *n++ = *t;
         break;
      }
      *n++ = *q++;
   }
   *n = '\0';

   if (foundv)
      return(newbuf);

   // select a word and split on v
   q = newbuf;
   n = newbuf1;
   w = 0;
   while (*q) {
      if (*q == ' ') w++;
      *n++ = *q++;
      if (fix) continue;

      if ((words == 2 && w == 1 && *q == 'v') ||
          (words == 3 && w > 1 && *q == 'v') ||
          (words >= 4 && w > 1 && *q == 'v')) {
         *n++ = *q++;
         *n++ = ' ';
         *n++ = toupper(*q++);
         fix = 1;
      }

   }
   *n = '\0';

   return (newbuf1);
}

int
main(int argc,
     char *argv[])
{
   FILE *from, *to;
   char ch;
   char *lp = linebuf;
   char *p;
   char *pp;
   char *nlp;

   if ( argc !=3) {
       printf("Usage: change <source> <destination>\n");
       exit(1);
   }

   /* open source file */
   if ((from = fopen(argv[1], "r")) == NULL) {
       printf("Cannot open source file.\n");
       exit(1);
   }

   /* open destination file */
   if ((to = fopen(argv[2], "w")) == NULL) {
       printf("Cannot open destination file.\n");
       exit(1);
   }

   /* copy the file and fix up as necessary */
   while (!feof(from)) {
      ch = fgetc(from);
      *lp++ = ch;
      if (ch == '\n') {
         *lp = '\0';

         /* got a line */
        lp = linebuf;
        nlp = newlinebuf;
        while (*lp) {

           /* check for start of pattern */
           if (*lp == '[' && *(lp+1) == '[' ) {       // start of citation
              pp = patternbuf;
              while (*lp) {
                 if (*lp == ']' && *(lp+1) == ']') {  // end of citation
                    *pp++ = *lp++;
                    *pp++ = *lp++;
                    *pp   = '\0';

                    pp = change_citation();           // parse citation and change if necessary
                    while (*pp)                       // write citation out
                      *nlp++ = *pp++;

                    break;
                 }
                 *pp++ = *lp++;
              }
           }

           *nlp++ = *lp++;
        }
        *nlp = '\0';

        /* write out line */
        nlp = newlinebuf;
        while (*nlp) fputc(*nlp++, to);
        lp = linebuf;
     }
  }

   if (fclose(from) == EOF) {
      printf("Error closing source file.\n");
      exit(1);
   }

   if (fclose(to) == EOF) {
      printf("Error closing destination file.\n");
      exit(1);
   }

   exit(0);
}

On the following test file

#
false imprisonment (see: [[Bird v Jones (1845)]]).
false imprisonment (see: [[Bird v Jones (1845)]] and [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird et al v Jones et al 1845]] and arrest (see: [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird vjones 1845]]) and also arrest (see: [[Smith vjones 1865]]).
#
false imprisonment (see: [[Bird vjones (1845)]]) and arrest (see: [[Smith vjones (1865)]]).
false imprisonment (see: [[Bird Murphy vjones (1845)]]).
false imprisonment (see: [[Bird Murphy vjones smith (1845)]]).
false imprisonment (see: [[Bird et al vjones et al (1845)]]).

it outputs

#
false imprisonment (see: [[Bird v Jones (1845)]]).
false imprisonment (see: [[Bird v Jones (1845)]] and [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird et al v Jones et al (1845)]] and arrest (see: [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird v Jones (1845)]]) and also arrest (see: [[Smith v Jones (1865)]]).
#
false imprisonment (see: [[Bird v Jones (1845)]]) and arrest (see: [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird Murphy v Jones (1845)]]).
false imprisonment (see: [[Bird Murphy v Jones smith (1845)]]).
false imprisonment (see: [[Bird et al v Jones et al (1845)]]).
1 Like

Hi,

I get the following

Segmentation fault (core dumped)

I copied the code I gave you from this site, complied it and ran it against the test file I also provided. It performed as expected and did not core dump.

You obviously did something different. Please tell us what you did differently and why you are getting different results such as a core dump.

OK it works perfectly on the test file,
I ran it on my mysql dump, and it created the dump errors.

How can I overcome this?

It means that there is something in the mysql dump file other than you described to me - probably nulls or lines longer than 5000 characters.

If you can give me a pointer to the file so I can download it, I will figure out what is going on. Otherwise there is not much that I can do.

1 Like

Hi,

Its 132mb in size, right now I am dumping it as an CSV file, and see if that makes a difference. If that does not help, could I PM you the file location of the dump?

---------- Post updated at 11:02 PM ---------- Previous update was at 10:39 PM ----------

Hi,

I tried it on the csv file and it gave me the following result,

root@server [/home/lawiki/dump]# ./change lawiki_ukwiki1.csv lawiki.csv
*** glibc detected *** ./change: free(): invalid pointer: 0x9f584000 ***
======= Backtrace: =========
/lib/libc.so.6[0x9f4875a5]
/lib/libc.so.6(cfree+0x59)[0x9f4879e9]
/lib/libc.so.6(_IO_free_backup_area+0x33)[0x9f483553]
/lib/libc.so.6(__uflow+0x5c)[0x9f483bfc]
/lib/libc.so.6(getc+0xac)[0x9f47d9ec]
./change[0x8048846]
/lib/libc.so.6(__libc_start_main+0xdc)[0x9f433e9c]
./change[0x8048451]
======= Memory map: ========
08048000-08049000 r-xp 00000000 08:05 40212539   /home/lawiki/dump/change
08049000-0804a000 rw-p 00001000 08:05 40212539   /home/lawiki/dump/change
0804a000-0807e000 rw-p 00000000 00:00 0          [heap]
9f404000-9f40f000 r-xp 00000000 08:05 41713882   /lib/libgcc_s-4.1.2-20080825.so.1
9f40f000-9f410000 rw-p 0000a000 08:05 41713882   /lib/libgcc_s-4.1.2-20080825.so.1
9f41d000-9f41e000 rw-p 00000000 00:00 0
9f41e000-9f571000 r-xp 00000000 08:05 41713695   /lib/libc-2.5.so
9f571000-9f573000 r--p 00152000 08:05 41713695   /lib/libc-2.5.so
9f573000-9f574000 rw-p 00154000 08:05 41713695   /lib/libc-2.5.so
9f574000-9f578000 rw-p 00000000 00:00 0
9f583000-9f585000 rw-p 00000000 00:00 0
9f585000-9f586000 r-xp 00000000 00:00 0          [vdso]
9f586000-9f5a1000 r-xp 00000000 08:05 41713672   /lib/ld-2.5.so
9f5a1000-9f5a2000 r--p 0001a000 08:05 41713672   /lib/ld-2.5.so
9f5a2000-9f5a3000 rw-p 0001b000 08:05 41713672   /lib/ld-2.5.so
b4671000-b4687000 rw-p 00000000 00:00 0          [stack]
Aborted (core dumped)

The utility certainly will not work with a CSV file. It was designed to specifically work with the example data your provided.

Just send me a private message with the location of the file and I will look at it.

Or back to the orginal approach. MediaWiki exports data using an XML format. MySQL using an SQL format. Let's assume the SQL format, with one insert-record per line (which may require a special mysql-dump option)

cp dump.sql dump-working.sql
perl -ipe 's/\[\[(\S+) v(\S+) (\d{4})\]\]/[[$1 v $2 ($3)]]/g' dump-working.sql

you can then do a diff to see if it's close to what you want:

diff dump.sql dump-working.sql | less

Humm, I believe the output from the Otheus Perl script fails to fix the problem. For example, consider the following test file:

false imprisonment (see: [[Bird v Jones (1845)]]).
false imprisonment (see: [[Bird v Jones (1845)]] and [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird et al v Jones et al 1845]] and arrest (see: [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird vjones 1845]]) and also arrest (see: [[Smith vjones 1865]]).
false imprisonment (see: [[Bird vjones (1845)]]) and arrest (see: [[Smith vjones (1865)]]).
false imprisonment (see: [[Bird Murphy vjones (1845)]]).
false imprisonment (see: [[Bird Murphy vjones smith (1845)]]).
false imprisonment (see: [[Bird et al vjones et al (1845)]]).

Here is the output generated by the Perl script:

false imprisonment (see: [[Bird v Jones (1845)]]).
false imprisonment (see: [[Bird v Jones (1845)]] and [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird et al v Jones et al 1845]] and arrest (see: [[Smith v Jones (1865)]]).
false imprisonment (see: [[Bird v jones (1845)]]) and also arrest (see: [[Smith v jones (1865)]]).
false imprisonment (see: [[Bird vjones (1845)]]) and arrest (see: [[Smith vjones (1865)]]).
false imprisonment (see: [[Bird Murphy vjones (1845)]]).
false imprisonment (see: [[Bird Murphy vjones smith (1845)]]).
false imprisonment (see: [[Bird et al vjones et al (1845)]]).

The problem with the " v " remains!

The problem domain is complicated by the fact that the number of words within the citation is variable. The next issue is to upper case the first letter after " v " is not already uppercase. Solve these and the rest is trivial.

I didn't pay close attention to the follow-up posts of this thread. Make it two-pass. Let's try this:

perl -pe 's/\[\[(\w.*?) v(\w)(.*?) (\d{4}|\(\d{4}\))\]\]/[[$1 v \U$2\E$3 $4]]/;s/(\[\[.*? )(\d{4})(\]\])/$1($2)$3/g' dump-working.sql

The first substitution handles the "v" problem, while the second handles the dates. (It could be done all in one, assuming that ALL such references were wrong, but if some are partially correct, the one-pass version fails on those)

Hi,

Thank you so much!
About 70% of the links are displayed correctly, will the above break the correct once?

That's rather difficult to say since I've seen only about 20 examples. But they fix those 20 and don't break the working ones.