-
Notifications
You must be signed in to change notification settings - Fork 0
/
CONTENTS
334 lines (311 loc) · 19.5 KB
/
CONTENTS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
This is a package to build robots for MediaWiki wikis like Wikipedia. Some
example robots are included.
=======================================================================
PLEASE DO NOT PLAY WITH THIS PACKAGE. These programs can actually
modify the live wiki on the net, and proper wiki-etiquette should
be followed before running it on any wiki.
=======================================================================
To get started on proper usage of the bot framework, please refer to:
http://www.mediawiki.org/wiki/Manual:Pywikipediabot
The contents of the package are:
=== readme and config files ===
CONTENTS : THIS file
LICENSE : a reference to the MIT license
README : Short info string used by PyWikipediaBot Nightlies
setup.cfg : Setup file for automated tests of the package
version : package version collected by PyWikipediaBot Nightlies,
otherwise omitted
config.py : Configuration module containing all defaults. Do not
change these! See below how to change values.
fixes.py : Stores predefined replacements used by replace.py.
generate_family_file.py: Creates a new family file.
generate_user_files.py : Creates user-config.py or user-fixes.py.
=== Library routines ===
apispec.py : Library to handle special pages through API
botlist.py : Allows access to the site's bot user list.
catlib.py : Library routines written especially to handle
category pages and recurse over category contents.
daemonize.py : The process will fork to the background and return
control to the terminal
date.py : Date formats in various languages
diskcache.py : Disk caching module
family.py : Abstract superclass for wiki families. Subclassed by
the classes in the 'families' subdirectory.
gui.py : Some GUI elements for solve_disambiguation.py
logindata.py : Use of pywikipedia as a library.
mysql_autoconnection.py: A small MySQL wrapper that catches dead MySQL
connections, and tries to reconnect them.
pagegenerators.py : Generator pages.
pageimport.py : Import pages from a certain wiki to another.
query.py : API query library
rciw.py : A IRC script to check for Recent Changes through IRC,
and to check for interwikis in those recently modified
articles.
simple_family.py : Family file in conjunction with none-pywikipedia
config files
titletranslate.py : rules and tricks to auto-translate wikipage titles
userlib.py : Library to work with users, their pages and talk pages
wikicomserver.py : This library allows the use of the pywikipediabot
directly from COM-aware applications.
wikipedia.py : The wikipedia library
wikipediatools.py : Returns package base directory
wiktionary.py : The wiktionary library
xmlreader.py : Reading and parsing XML dump files.
=== Utilities ===
articlenos.py : Displays the ordinal number of the new articles being
created visible on the Recent Changes list.
basic.py : Is a template from which simple bots can be made.
delinker.py : Delinks and replaces images.
editarticle.py : Edit an article with your favourite editor. Run
the script with the "--help" option to get
detailed infortion on possiblities.
extract_wikilinks.py : Two bots to get all linked-to wiki pages from an
HTML-file. They differ in their output:
extract_names gives bare names (can be used for
solve_disambiguation.py, table2wiki.py or
windows-chars.py), extract_wikilinks gives them in
interwiki-link format (can be used for
interwiki.py)
followlive.py : Periodically grab the list of new articles and analyze
them. If the article is too short, a menu will let you
easily add a template.
get.py : Script to get a page and write its contents to standard
output.
login.py : Log in to an account on your "home" wiki. or check the
login status
maintainer.py : Shares tasks between workers.
maintcont.py : The controller bot for maintainer.py
splitwarning.py : split an interwiki.log file into warning files for each
separate language. suggestion: Zip the created
files up, put them somewhere on the internet, and
send an announcement of the location on the robot
mailinglist.
testfamily.py : Check whether you are logged in all known languages
in a family.
rcsort.py : A tool to see the recentchanges ordered by user instead
of by date.
upd-log.py : Update notification script
version.py : Outputs Pywikipedia's revision number, Python's version
and OS used.
warnfile.py : A robot that parses a warning file created by
interwiki.py on another language wiki, and
implements the suggested changes without verifying
them.
watchlist.py : Allows access to the bot account's watchlist.
=== Robots ===
add_text.py : Adds text at the top or end of pages
archivebot.py : Archives discussion threads
blockpagechecker.py : Deletes any protection templates that are on pages
which aren't actually protected.
capitalize_redirects.py: Script to create a redirect of capitalize articles.
casechecker.py : Script to enumerate all pages in the wikipedia and
find all titles with mixed Latin and Cyrillic
alphabets.
catall.py : Add or change categories on a number of pages.
category.py : Add a category link to all pages mentioned on a page,
change or remove category tags
category_redirect.py : Maintain category redirects and replace links to
redirected categories.
censure.py : Bad word checker bot.
cfd.py : Processes the categories for discussion working page.
It parses out the actions that need to be taken as a
result of CFD discussions and performs them.
checkimages.py : Check recently uploaded files. Checks if a file
description is present and if there are other problems
in the image's description.
clean_sandbox.py : This bot makes the cleaned of the page of tests.
commons_category_redirect.py: Cleans Commons:Category:Non-empty category
redirects by moving all the files, pages and categories
from redirected category to the target category.
commons_link.py : This robot include commons template to linking Commons
and your wiki project.
commonscat.py : Adds {{commonscat}} to Wikipedia categories (or
articles), if other language wikipedia already has such
a template
copyright.py : This robot check copyright text in Google, Yahoo! and
Live Search.
copyright_clean.py : Remove reports of copyright.py on wiki pages.
Uses YurikAPI.
copyright_put .py : Put reports of copyright.py on wiki pages.
cosmetic_changes.py : Can do slight modifications to a wiki page source code
such that the code looks cleaner.
create_categories.py : Program to batch create categories.
data_ingestion.py : A generic bot to do batch uploading to Commons.
deledpimage.py : Remove EDP images in non-article namespaces.
delete.py : This script can be used to delete pages en masse.
disambredir.py : Changing redirect names in disambiguation pages.
djvutext.py : Extracts OCR text from djvu files and uploads onto
pages in the "Page" namespace on Wikisource.
featured.py : A robot to check feature articles.
fixing_redirects.py : Correct all redirect links of processed pages.
flickrripper.py : Upload images from Flickr easily.
followlive.py : follow new articles on a wikipedia and flag them
with a template.
image.py : This script can be used to change one image to another
or remove an image entirely.
imagecopy.py : Copies images from a wikimedia wiki to Commons
imagecopy_self.py : Copy self published files from the English Wikipedia to
Commons.
imageharvest.py : Bot for getting multiple images from an external site.
iamgerecat.py : Try to find categories for media on Commons.
imagetransfer.py : Given a wiki page, check the interwiki links for
images, and let the user choose among them for
images to upload.
imageuncat.py : Adds uncat template to images without categories at
Commons
inline_images.py : This bot looks for images that are linked inline
(i.e., they are hosted from an external server and
hotlinked).
interwiki.py : A robot to check interwiki links on all pages (or
a range of pages) of a wiki.
interwiki_graph.py : Possible create graph with interwiki.py.
isbn.py : Bot to convert all ISBN-10 codes to the ISBN-13
format.
lonelypages.py : Place a template on pages which are not linked to by
other pages, and are therefore lonely
makecat.py : Given an existing or new category, find pages for that
category.
match_images.py : Match two images based on histograms.
misspelling.py : Similar to solve_disambiguation.py. It is supposed to
fix links that contain common spelling mistakes.
movepages.py : Bot page moves to another title.
ndashredir.py : Creates hyphenated redirects to articles with n dash
or m dash in their title.
noreferences.py : Searches for pages where <references /> is missing
although a <ref> tag is present, and in that case adds
a new references section.
nowcommons.py : This bot can delete images with NowCommons template.
pagefromfile.py : This bot takes its input from a file that contains a
number of pages to be put on the wiki.
panoramiopicker.py : Upload images from Panoramio easily.
patrol.py : Obtains a list pages and marks the edits as patrolled
based on a whitelist.
piper.py : Pipes article text through external program(s) on
STDIN and collects its STDOUT which is used as the
new article text if it differs from the original.
protect.py : Protect and unprotect pages en masse.
redirect.py : Fix double redirects and broken redirects. Note:
solve_disambiguation also has functions which treat
redirects.
reflinks.py : Search for references which are only made of a link
without title and fetch the html title from the link to
use it as the title of the wiki link in the reference.
replace.py : Search articles for a text and replace it by another
text. Both text are set in two configurable
text files. The bot can either work on a set of given
pages or crawl an SQL dump.
revertbot.py : Revert edits.
saveHTML.py : Downloads the HTML-pages of articles and images.
selflink.py : This bot goes over multiple pages of the home wiki,
searches for selflinks, and allows removing them.
solve_disambiguation.py: Interactive robot doing disambiguation.
spamremove.py : Remove links that are being or have been spammed.
speedy_delete.py : This bot load a list of pages from the category of
candidates for speedy deletion and give the
user an interactive prompt to decide whether
each should be deleted or not.
spellcheck.py : This bot spellchecks wiki pages.
standardize_interwiki.py:A robot that downloads a page, and reformats the
interwiki links in a standard way (i.e. move all
of them to the bottom or the top, with the same
separator, in the right order).
standardize_notes.py : Converts external links and notes/references to
: Footnote3 ref/note format. Rewrites References.
statistics_in_wikitable.py: This bot renders statistics provided by
[[Special:Statistics]] in a table on a wiki page.
Thus it creates and updates a statistics wikitable.
sum_disc.py : Summarize discussions spread over the whole wiki
including all namespaces
table2wiki.py : Semi-automatic converting HTML-tables to wiki-tables.
tag_nowcommons.py : Tag files available at Commons with the Nowcommons
template
template.py : change one template (that is {{...}}) into another.
templatecount.py : Display the list of pages transcluding a given list
of templates.
touch.py : Bot goes over all pages of the home wiki, and edits
them without changing.
unlink.py : This bot unlinks a page on every page that links to it.
unusedfiles.py : Bot appends some text to all unused images and other
text to the respective uploaders.
upload.py : upload an image to a wiki.
us-states.py : A robot to add redirects to cities for US state
abbreviations.
weblinkchecker.py : Check if external links are still working.
welcome.py : Script to welcome new users.
=== Directories ===
botlist : Contains cached bot users
cache : Contains disk cached pages and data retrieved by
featured.y
category :
commonsdelinker : Contains commons delinker bot maintained by Siebrand
copyright : Contains information retrieved by copyright.py
deadlinks : Contains information retrieved by weblinkchecker.py
disambiguations : If you run solve_disambiguation.py with the -primary
argument, the bot will save information here
externals : Contains all external software that might be used by
by PyWikipediaBot scripts and libraries. The most
important among them are:
* simplejson; used by query.py and needed for python
release prev. 2.6
* spelling; dictionaries for spellcheck.py
* BeautifulSoup.py; http://www.crummy.com/software/BeautifulSoup
(and more)
Missing packages are installed or downloaded automa-
tically by externals/__init__.py.
families : Contains wiki-specific information like URLs,
languages, encodings etc.
i18n : Contains i18n translations for bot edit summaries
interwiki_dumps : If the interwiki bot is interrupted, it will store
a dump file here. These files will be read when using
the interwiki bot with -restore or -continue.
interwiki_graphs : Contains graphs for interwiki_graph.py
login-data : login.py stores your cookies here (Your password won't
be stored as plaintext).
logs : Contains logfiles.
maintenance : contains maintenance scripts for the developing team
pywikibot : Contains some libraries and control files
test : Some test stuff for the developing team
userinterfaces : Contains Tkinter, WxPython, terminal and
transliteration interfaces user choose in
user-config.py
watchlists : Information retrieved by watchlist.py will be stored
here.
wiktionary : Contains script to used for Wiktionary project.
External software can be used with PyWikipediaBot:
* Win32com library for use with wikicomserver.py
* Pydot, Pyparsing and Graphviz for use with interwiki_graph.py
* JSON for use with query.py
* PyGoogle to access Google Web API and PySearch to access Yahoo! Search
Web Services for use with copyright.py and pagegenerators.py
* MySQLdb to access MySQL database for use with pagegenerators.py
PyWikipediaBot makes use of some modules that are part of python, but that
are not installed by default on some Linux distributions:
* python-xml (required to parse XML via SaX2)
* python-celementtree (recommended if you use XML dumps)
* python-tkinter (optional, used by some experimental GUI stuff)
All other (more exotic) external software is installed automatically by the
externals module, if missing. This is done by using several methods in following
order; (svn:externals), OS package management system, download and extract from
URL, clone of mercurial repository.
More precise information, and a list of the options that are available for
the various programs, can be retrieved by running the bot with the -help
parameter, e.g.
python interwiki.py -help
You need to have at least python version 2.7.2 (http://www.python.org/download/)
or newer installed on your computer to be able to run any of the code in this
package, but not 3.x, because pywikipediabot is still not updated to it! Support
for older versions of python is not planned. Some scripts could run with older
python releases. Please refer the manual at mediawiki for further details and
restrictions.
You do not need to "install" this package to be able to make use of
it. You can actually just run it from the directory where you unpacked
it or where you have your copy of the SVN sources.
The first time you run a script, the package creates a file named user-config.py
in your current directory. It asks for the family and language code you are
working on and at least for the bot's user name; this will be used to identify
you when the robot is making changes, in case you are not logged in. You may
choose to create a small or extended version of the config file with further
informations. Other variables that can be set in the configuration file, please
check config.py for ideas.
After that, you are advised to create a username + password for the bot, and
run login.py. Anonymous editing is not possible.