Benutzer:Klausi/Amsmath-bot

Aus VoWi
< Benutzer:Klausi
Version vom 13. Oktober 2009, 11:15 Uhr von Migration-bot (Diskussion | Beiträge) (replaced forum links)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Wechseln zu: Navigation, Suche

Ein Bot für die einzelnen Wikis des Wikiservers, der alle <amsmath>-Tags durch <math>-Tags ersetzt und außerdem ein paar Unterschiede zwischen den beiden Formaten ändert. <math> kommt standardmäßig mit Mediawiki und muss deshalb nicht als eigenes Modul gepflegt werden (amsmath schon). Siehe dazu auch den Thread im Forum: f.thread:69368

Das ganze funktioniert mit dem pywikipedia-Framework (wie der alte Umlaut-Bot). Zuerst die aktuelle Version aus dem SVN auschecken, dann eine user-config.py Datei im pywikipedia Ordner anlegen:

mylang = 'de'
family = 'mathe1'
usernames['mathe1']['de'] = 'Amsmath-bot'

Dann mathe1_family.py im Unterordner families erstellen:


# -*- coding: utf-8  -*-              # REQUIRED
import config, family, urllib         # REQUIRED

class Family(family.Family):          # REQUIRED
    def __init__(self):               # REQUIRED
        family.Family.__init__(self)  # REQUIRED
        self.name = 'mathe1'        # REQUIRED; replace with actual name

        self.langs = {                # REQUIRED
            'de': 'wikiserver.fsinf.at',  # Include one line for each wiki in family
        }

    def protocol(self, code):
        """
        Can be overridden to return 'https'. Other protocols are not supported.
        """
        return 'http'

    def scriptpath(self, code):
        """The prefix used to locate scripts on this wiki.

        This is the value displayed when you enter {{SCRIPTPATH}} on a
        wiki page (often displayed at [[Help:Variables]] if the wiki has
        copied the master help page correctly).

        The default value is the one used on Wikimedia Foundation wikis,
        but needs to be overridden in the family file for any wiki that
        uses a different value.

        """
        return '/mathe1'

    # IMPORTANT: if your wiki does not support the api.php interface,
    # you must uncomment the second line of this method:
    def apipath(self, code):
        raise NotImplementedError, "%s wiki family does not support api.php" % self.name
        return '%s/api.php' % self.scriptpath(code)

    # Which version of MediaWiki is used?
    def version(self, code):
        # Replace with the actual version being run on your wiki
        return '1.13.2'

    def code2encoding(self, code):
        """Return the encoding for a specific language wiki"""
        # Most wikis nowadays use UTF-8, but change this if yours uses
        # a different encoding
        return 'utf-8'


Dann noch amsmath-bot.py im pywikipedia Ordner erstellen:


# -*- coding: utf-8 -*-
 
 ## Copyright (C) 2008 klausi <klausi[ät]fsinf.at>
 ##
 ## amsmath-bot is free software; you can redistribute it and/or modify
 ## it under the terms of the GNU General Public License as published
 ## by the Free Software Foundation; version 3 or any later version.
 
import wikipedia # Import the wikipedia module
import re

site = wikipedia.getSite()
startpage = '!'
namespaces = site.namespaces()
namespaces.insert(0, "") # insert default namespace
# these namespaces result in an error (probably they are empty)
badNamespaces = []
# mathe1: [5, 7, 13, 15, -2, -1]
# mathe2: [5, 7, 9, 11, 13, 15, -2]
# mathe3: []
# statistik: [3, 4, 5, 7, 9, 11, 13, 15]
i = 0
#page = wikipedia.Page(site, u"Hauptseite")
#html = site.getUrl(u"/mathe1/index.php/Hauptseite");
#print html
for namespace in namespaces:
    print namespace
    nameSpaceIndex = site.getNamespaceIndex(namespace)
    if nameSpaceIndex in badNamespaces:
        continue
    for page in site.allpages(startpage, nameSpaceIndex):
        i = i+1
        try:
            text = page.get(get_redirect = True)
        except:
            print "GET Error: " +page.title()
            continue;
        
        text2 = text.replace(u"<amsmath>", u"<math>")
        text2 = text2.replace(u"</amsmath>", u"</math>")
        text2 = text2.replace(u"\\begin{split}", u"\\begin{align}")
        text2 = text2.replace(u"\\end{split}", u"\\end{align}")
        text2 = text2.replace(u"\\begin{aligned}", u"\\begin{align}")
        text2 = text2.replace(u"\\end{aligned}", u"\\end{align}")
        text2 = text2.replace(u"\\widetilde{", u"\\tilde{")
        text2 = text2.replace(u"\\dotsc", u"\\ldots")
        text2 = text2.replace(u"\\dotsm", u"\\cdots")
        text2 = text2.replace(u"\\:", u"\\ ")
        text2 = text2.replace(u"\\intop", u"\\int")
        text2 = text2.replace(u"f\\\"{u}r", u"fuer")
        text2 = text2.replace(u"f\\\"ur", u"fuer")
        text2 = text2.replace(u"\\text,", u",")
        text2 = text2.replace(u"\\smallint", u"\\int")
        text2 = re.sub(u"\\hspace\{.*\}", u"\qquad", text2)
        text2 = text2.replace(u"\\text{'}", u"'")
        text2 = text2.replace(u"\\text{ ' }", u"'")
        text2 = text2.replace(u"\\begin{tabular}", u"\\begin{array}")
        text2 = text2.replace(u"\\end{tabular}", u"\\end{array}")
        
        # Keep empty math tags in comments
        # delete empty math tags outside comments
        # this is a little bit complex :-(
        commentStart = text2.find(u"<!--")
        if commentStart == -1:
            # no comment start found, replace all
            text2 = text2.replace(u"<math></math>", u"")
            text2 = text2.replace(u"<math> </math>", u"")
        else:
            commentEnd = text2.find(u"-->")
            if commentEnd == -1:
                # no comment end found, replace all
                text2 = text2.replace(u"<math></math>", u"")
                text2 = text2.replace(u"<math> </math>", u"")
            else:
                oldCommentEnd = 0
                while (commentStart != -1) and (commentEnd != -1):
                    nocomment = text2[oldCommentEnd:commentStart]
                    #print nocomment
                    nocomment = nocomment.replace(u"<math></math>", u"")
                    nocomment = nocomment.replace(u"<math> </math>", u"")
                    text2 = text2[:oldCommentEnd] + nocomment + text2[commentStart:]
                    commentStart = text2.find(u"<!--", commentEnd)
                    oldCommentEnd = commentEnd
                    commentEnd = text2.find(u"-->", commentStart)
                nocomment = text2[oldCommentEnd:]
                nocomment = nocomment.replace(u"<math></math>", u"")
                nocomment = nocomment.replace(u"<math> </math>", u"")
                text2 = text2[:oldCommentEnd] + nocomment
        #print text
        if text != text2:
            page.put(text2, u"replaced <amsmath> with <math>")
            print "%d: %s" % (i, page.title())
        html = site.getUrl(u"/mathe1/index.php/" +page.urlname());
        #print html
        if html.find(u"Parser-Fehler") != -1:
            print u"   Parser Fehler!!!!: " +page.title()


Aufruf im pywikipedia-Ordner: python amsmath-bot.py

Natürlich musst du dir vorher einen Bot-User im Wiki anlegen (in meinem Fall Amsmath-Bot) und diesen optimalerweise von einem Admin in die Gruppe Bots gegeben lassen (dann scheinen die vielen Änderungen vom Bot nicht in den letzten Änderungen auf, können aber eingeblendet werden).

Der Code ist wahrscheinlich nicht ganz optimal, aber er funktioniert.