-
Notifications
You must be signed in to change notification settings - Fork 0
/
category.py
1009 lines (912 loc) · 42.3 KB
/
category.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
Scripts to manage categories.
Syntax: python category.py action [-option]
where action can be one of these:
* add - mass-add a category to a list of pages
* remove - remove category tag from all pages in a category
* move - move all pages in a category to another category
* tidy - tidy up a category by moving its articles into subcategories
* tree - show a tree of subcategories of a given category
* listify - make a list of all of the articles that are in a category
and option can be one of these:
Options for "add" action:
* -person - sort persons by their last name
* -create - If a page doesn't exist, do not skip it, create it instead
If action is "add", the following options are supported:
¶ms;
Options for "listify" action:
* -overwrite - This overwrites the current page with the list even if
something is already there.
* -showimages - This displays images rather than linking them in the list.
* -talkpages - This outputs the links to talk pages of the pages to be
listified in addition to the pages themselves.
Options for "remove" action:
* -nodelsum - This specifies not to use the custom edit summary as the
deletion reason. Instead, it uses the default deletion reason
for the language, which is "Category was disbanded" in
English.
Options for "move" action:
* -hist - Creates a nice wikitable on the talk page of target category
that contains detailed page history of the source category.
Options for several actions:
* -rebuild - reset the database
* -from: - The category to move from (for the move option)
Also, the category to remove from in the remove option
Also, the category to make a list of in the listify option
* -to: - The category to move to (for the move option)
- Also, the name of the list to make in the listify option
NOTE: If the category names have spaces in them you may need to use
a special syntax in your shell so that the names aren't treated as
separate parameters. For instance, in BASH, use single quotes,
e.g. -from:'Polar bears'
* -batch - Don't prompt to delete emptied categories (do it
automatically).
* -summary: - Pick a custom edit summary for the bot.
* -inplace - Use this flag to change categories in place rather than
rearranging them.
* -recurse - Recurse through all subcategories of categories.
* -pagesonly - While removing pages from a category, keep the subpage links
and do not remove them
* -match - Only work on pages whose titles match the given regex (for
move and remove actions).
For the actions tidy and tree, the bot will store the category structure
locally in category.dump. This saves time and server load, but if it uses
these data later, they may be outdated; use the -rebuild parameter in this
case.
For example, to create a new category from a list of persons, type:
python category.py add -person
and follow the on-screen instructions.
Or to do it all from the command-line, use the following syntax:
python category.py move -from:US -to:'United States'
This will move all pages in the category US to the category United States.
"""
#
# (C) Rob W.W. Hooft, 2004
# (C) Daniel Herding, 2004
# (C) Wikipedian, 2004-2008
# (C) leogregianin, 2004-2008
# (C) Cyde, 2006-2010
# (C) Anreas J Schwab, 2007
# (C) xqt, 2009-2012
# (C) Pywikipedia team, 2008-2012
#
__version__ = '$Id$'
#
# Distributed under the terms of the MIT license.
#
import os, re, pickle, bz2
import wikipedia as pywikibot
import catlib, config, pagegenerators
from pywikibot import i18n
# This is required for the text that is shown when you run this script
# with the parameter -help.
docuReplacements = {
'¶ms;': pagegenerators.parameterHelp
}
cfd_templates = {
'wikipedia' : {
'en':[u'cfd', u'cfr', u'cfru', u'cfr-speedy', u'cfm', u'cfdu'],
'fi':[u'roskaa', u'poistettava', u'korjattava/nimi', u'yhdistettäväLuokka'],
'he':[u'הצבעת מחיקה', u'למחוק'],
'nl':[u'categorieweg', u'catweg', u'wegcat', u'weg2']
},
'commons' : {
'commons':[u'cfd', u'move']
}
}
class CategoryDatabase:
'''This is a temporary knowledge base saving for each category the contained
subcategories and articles, so that category pages do not need to be loaded
over and over again
'''
def __init__(self, rebuild = False, filename = 'category.dump.bz2'):
if rebuild:
self.rebuild()
else:
try:
if not os.path.isabs(filename):
filename = pywikibot.config.datafilepath(filename)
f = bz2.BZ2File(filename, 'r')
pywikibot.output(u'Reading dump from %s'
% pywikibot.config.shortpath(filename))
databases = pickle.load(f)
f.close()
# keys are categories, values are 2-tuples with lists as entries.
self.catContentDB = databases['catContentDB']
# like the above, but for supercategories
self.superclassDB = databases['superclassDB']
del databases
except:
# If something goes wrong, just rebuild the database
self.rebuild()
def rebuild(self):
self.catContentDB={}
self.superclassDB={}
def getSubcats(self, supercat):
'''For a given supercategory, return a list of Categorys for all its
subcategories. Saves this list in a temporary database so that it won't
be loaded from the server next time it's required.
'''
# if we already know which subcategories exist here
if supercat in self.catContentDB:
return self.catContentDB[supercat][0]
else:
subcatlist = supercat.subcategoriesList()
articlelist = supercat.articlesList()
# add to dictionary
self.catContentDB[supercat] = (subcatlist, articlelist)
return subcatlist
def getArticles(self, cat):
'''For a given category, return a list of Pages for all its articles.
Saves this list in a temporary database so that it won't be loaded from the
server next time it's required.
'''
# if we already know which articles exist here
if cat in self.catContentDB:
return self.catContentDB[cat][1]
else:
subcatlist = cat.subcategoriesList()
articlelist = cat.articlesList()
# add to dictionary
self.catContentDB[cat] = (subcatlist, articlelist)
return articlelist
def getSupercats(self, subcat):
# if we already know which subcategories exist here
if subcat in self.superclassDB:
return self.superclassDB[subcat]
else:
supercatlist = subcat.supercategoriesList()
# add to dictionary
self.superclassDB[subcat] = supercatlist
return supercatlist
def dump(self, filename = 'category.dump.bz2'):
'''Saves the contents of the dictionaries superclassDB and catContentDB
to disk.
'''
if not os.path.isabs(filename):
filename = pywikibot.config.datafilepath(filename)
if self.catContentDB or self.superclassDB:
pywikibot.output(u'Dumping to %s, please wait...'
% pywikibot.config.shortpath(filename))
f = bz2.BZ2File(filename, 'w')
databases = {
'catContentDB': self.catContentDB,
'superclassDB': self.superclassDB
}
# store dump to disk in binary format
try:
pickle.dump(databases, f, protocol=pickle.HIGHEST_PROTOCOL)
except pickle.PicklingError:
pass
f.close()
else:
try:
os.remove(filename)
except EnvironmentError:
pass
else:
pywikibot.output(u'Database is empty. %s removed'
% pywikibot.config.shortpath(filename))
class AddCategory:
'''A robot to mass-add a category to a list of pages.'''
def __init__(self, generator, sort_by_last_name=False, create=False,
editSummary='', dry=False):
self.generator = generator
self.sort = sort_by_last_name
self.create = create
self.site = pywikibot.getSite()
self.always = False
self.dry = dry
self.newcatTitle = None
self.editSummary = editSummary
def sorted_by_last_name(self, catlink, pagelink):
'''Return a Category with key that sorts persons by their last name.
Parameters: catlink - The Category to be linked
pagelink - the Page to be placed in the category
Trailing words in brackets will be removed. Example: If
category_name is 'Author' and pl is a Page to [[Alexandre Dumas
(senior)]], this function will return this Category:
[[Category:Author|Dumas, Alexandre]]
'''
page_name = pagelink.title()
site = pagelink.site
# regular expression that matches a name followed by a space and
# disambiguation brackets. Group 1 is the name without the rest.
bracketsR = re.compile('(.*) \(.+?\)')
match_object = bracketsR.match(page_name)
if match_object:
page_name = match_object.group(1)
split_string = page_name.split(' ')
if len(split_string) > 1:
# pull last part of the name to the beginning, and append the
# rest after a comma; e.g., "John von Neumann" becomes
# "Neumann, John von"
sorted_key = split_string[-1] + ', ' + \
' '.join(split_string[:-1])
# give explicit sort key
return pywikibot.Page(site, catlink.title() + '|' + sorted_key)
else:
return pywikibot.Page(site, catlink.title())
def run(self):
self.newcatTitle = pywikibot.input(
u'Category to add (do not give namespace):')
if not self.site.nocapitalize:
self.newcatTitle = self.newcatTitle[:1].upper() + \
self.newcatTitle[1:]
if not self.editSummary:
self.editSummary = i18n.twtranslate(self.site, 'category-adding',
{'newcat': self.newcatTitle})
counter = 0
for page in self.generator:
self.treat(page)
counter += 1
pywikibot.output(u"%d page(s) processed." % counter)
def load(self, page):
"""
Loads the given page, does some changes, and saves it.
"""
try:
# Load the page
text = page.get()
except pywikibot.NoPage:
if self.create:
pywikibot.output(u"Page %s doesn't exist yet; creating."
% (page.title(asLink=True)))
text = ''
else:
pywikibot.output(u"Page %s does not exist; skipping."
% page.title(asLink=True))
except pywikibot.IsRedirectPage, arg:
redirTarget = pywikibot.Page(self.site, arg.args[0])
pywikibot.warning(u"Page %s is a redirect to %s; skipping."
% (page.title(asLink=True),
redirTarget.title(asLink=True)))
else:
return text
return None
def save(self, text, page, comment, minorEdit=True, botflag=True):
# only save if something was changed
if text != page.get():
# show what was changed
pywikibot.showDiff(page.get(), text)
pywikibot.output(u'Comment: %s' %comment)
if not self.dry:
if not self.always:
confirm = 'y'
while True:
choice = pywikibot.inputChoice(
u'Do you want to accept these changes?',
['Yes', 'No', 'Always'], ['y', 'N', 'a'], 'N')
if choice == 'a':
confirm = pywikibot.inputChoice(u"""\
This should be used if and only if you are sure that your links are correct!
Are you sure?""", ['Yes', 'No'], ['y', 'n'], 'n')
if confirm == 'y':
self.always = True
break
else: break
if self.always or choice == 'y':
try:
# Save the page
page.put(text, comment=comment,
minorEdit=minorEdit, botflag=botflag)
except pywikibot.LockedPage:
pywikibot.output(u"Page %s is locked; skipping."
% page.title(asLink=True))
except pywikibot.EditConflict:
pywikibot.output(
u'Skipping %s because of edit conflict'
% (page.title()))
except pywikibot.SpamfilterError, error:
pywikibot.output(
u'Cannot change %s because of spam blacklist entry %s'
% (page.title(), error.url))
else:
return True
return False
def treat(self, page):
text = self.load(page)
if text is None:
return
cats = page.categories()
# Show the title of the page we're working on.
# Highlight the title in purple.
pywikibot.output(
u"\n\n>>> \03{lightpurple}%s\03{default} <<<"
% page.title())
pywikibot.output(u"Current categories:")
for cat in cats:
pywikibot.output(u"* %s" % cat.title())
catpl = pywikibot.Page(self.site, self.newcatTitle, defaultNamespace=14)
if catpl in cats:
pywikibot.output(u"%s is already in %s."
% (page.title(), catpl.title()))
else:
if self.sort:
catpl = self.sorted_by_last_name(catpl, page)
pywikibot.output(u'Adding %s' % catpl.title(asLink=True))
cats.append(catpl)
text = pywikibot.replaceCategoryLinks(text, cats)
if not self.save(text, page, self.editSummary):
pywikibot.output(u'Page %s not saved.'
% page.title(asLink=True))
class CategoryMoveRobot:
"""Robot to move pages from one category to another."""
def __init__(self, oldCatTitle, newCatTitle, batchMode=False,
editSummary='', inPlace=False, moveCatPage=True,
deleteEmptySourceCat=True, titleRegex=None,
useSummaryForDeletion=True, withHistory=False):
site = pywikibot.getSite()
self.editSummary = editSummary
self.oldCat = catlib.Category(site, oldCatTitle)
self.newCatTitle = newCatTitle
self.inPlace = inPlace
self.moveCatPage = moveCatPage
self.batchMode = batchMode
self.deleteEmptySourceCat = deleteEmptySourceCat
self.titleRegex = titleRegex
self.useSummaryForDeletion = useSummaryForDeletion
self.withHistory = withHistory
def run(self):
site = pywikibot.getSite()
newCat = catlib.Category(site, self.newCatTitle)
# set edit summary message
if self.useSummaryForDeletion and self.editSummary:
reason = self.editSummary
else:
reason = i18n.twtranslate(site, 'category-was-moved') \
% {'newcat': self.newCatTitle, 'title': self.newCatTitle}
if not self.editSummary:
self.editSummary = i18n.twtranslate(site, 'category-changing') \
% {'oldcat':self.oldCat.title(),
'newcat':newCat.title()}
# Copy the category contents to the new category page
copied = False
oldMovedTalk = None
if self.oldCat.exists() and self.moveCatPage:
copied = self.oldCat.copyAndKeep(
self.newCatTitle,
pywikibot.translate(site, cfd_templates))
# Also move the talk page
if copied:
oldTalk = self.oldCat.toggleTalkPage()
if oldTalk.exists():
newTalkTitle = newCat.toggleTalkPage().title()
try:
talkMoved = oldTalk.move(newTalkTitle, reason)
except (pywikibot.NoPage, pywikibot.PageNotSaved), e:
#in order :
#Source talk does not exist, or
#Target talk already exists
pywikibot.output(e.message)
else:
if talkMoved:
oldMovedTalk = oldTalk
if self.withHistory:
# Whether or not there was an old talk page, we write
# the page history to the new talk page
history = self.oldCat.getVersionHistoryTable()
# Set the section title for the old cat's history on the new
# cat's talk page.
sectionTitle = i18n.twtranslate(site,
'category-section-title') \
% {'oldcat': self.oldCat.title()}
#Should be OK, we are within if self.oldCat.exists()
historySection = u'\n== %s ==\n%s' % (sectionTitle, history)
try:
text = newCat.toggleTalkPage().get() + historySection
except pywikibot.NoPage:
text = historySection
try:
newCat.toggleTalkPage().put(
text, i18n.twtranslate(site,
'category-version-history')
% {'oldcat': self.oldCat.title()})
except:
pywikibot.output(
'History of the category has not been saved to new talk page')
#TODO: some nicer exception handling (not too important)
# first move the page, than tagg the vh
# Move articles
gen = pagegenerators.CategorizedPageGenerator(self.oldCat,
recurse=False)
preloadingGen = pagegenerators.PreloadingGenerator(gen)
for article in preloadingGen:
if not self.titleRegex or re.search(self.titleRegex,
article.title()):
catlib.change_category(article, self.oldCat, newCat,
comment=self.editSummary,
inPlace=self.inPlace)
# Move subcategories
gen = pagegenerators.SubCategoriesPageGenerator(self.oldCat,
recurse=False)
preloadingGen = pagegenerators.PreloadingGenerator(gen)
for subcategory in preloadingGen:
if not self.titleRegex or re.search(self.titleRegex,
subcategory.title()):
catlib.change_category(subcategory, self.oldCat, newCat,
comment=self.editSummary,
inPlace=self.inPlace)
# Delete the old category and its moved talk page
if copied and self.deleteEmptySourceCat == True:
if self.oldCat.isEmptyCategory():
confirm = not self.batchMode
self.oldCat.delete(reason, confirm, mark = True)
if oldMovedTalk is not None:
oldMovedTalk.delete(reason, confirm, mark = True)
else:
pywikibot.output('Couldn\'t delete %s - not empty.'
% self.oldCat.title())
class CategoryListifyRobot:
'''Creates a list containing all of the members in a category.'''
def __init__(self, catTitle, listTitle, editSummary, overwrite = False, showImages = False, subCats = False, talkPages = False, recurse = False):
self.editSummary = editSummary
self.overwrite = overwrite
self.showImages = showImages
self.site = pywikibot.getSite()
self.cat = catlib.Category(self.site, 'Category:' + catTitle)
self.list = pywikibot.Page(self.site, listTitle)
self.subCats = subCats
self.talkPages = talkPages
self.recurse = recurse
def run(self):
listOfArticles = self.cat.articlesList(recurse = self.recurse)
if self.subCats:
listOfArticles += self.cat.subcategoriesList()
if not self.editSummary:
self.editSummary = i18n.twntranslate(self.site,
'category-listifying',
{'fromcat': self.cat.title(),
'num': len(listOfArticles)})
listString = ""
for article in listOfArticles:
if (not article.isImage() or self.showImages) and not article.isCategory():
if self.talkPages and not article.isTalkPage():
listString = listString + "*[[%s]] -- [[%s|talk]]\n" % (article.title(), article.toggleTalkPage().title())
else:
listString = listString + "*[[%s]]\n" % article.title()
else:
if self.talkPages and not article.isTalkPage():
listString = listString + "*[[:%s]] -- [[%s|talk]]\n" % (article.title(), article.toggleTalkPage().title())
else:
listString = listString + "*[[:%s]]\n" % article.title()
if self.list.exists() and not self.overwrite:
pywikibot.output(u'Page %s already exists, aborting.' % self.list.title())
else:
self.list.put(listString, comment=self.editSummary)
class CategoryRemoveRobot:
'''Removes the category tag from all pages in a given category
and if pagesonly parameter is False also from the category pages of all
subcategories, without prompting. If the category is empty, it will be
tagged for deleting. Does not remove category tags pointing at
subcategories.
'''
def __init__(self, catTitle, batchMode=False, editSummary='',
useSummaryForDeletion=True, titleRegex=None, inPlace=False,
pagesonly=False):
self.editSummary = editSummary
self.site = pywikibot.getSite()
self.cat = catlib.Category(self.site, 'Category:'+ catTitle)
# get edit summary message
self.useSummaryForDeletion = useSummaryForDeletion
self.batchMode = batchMode
self.titleRegex = titleRegex
self.inPlace = inPlace
self.pagesonly = pagesonly
if not self.editSummary:
self.editSummary = i18n.twtranslate(self.site, 'category-removing',
{'oldcat': self.cat.title()})
def run(self):
articles = self.cat.articlesList(recurse = 0)
if len(articles) == 0:
pywikibot.output(u'There are no articles in category %s' % self.cat.title())
else:
for article in articles:
if not self.titleRegex or re.search(self.titleRegex,article.title()):
catlib.change_category(article, self.cat, None, comment = self.editSummary, inPlace = self.inPlace)
if self.pagesonly:
return
# Also removes the category tag from subcategories' pages
subcategories = self.cat.subcategoriesList(recurse = 0)
if len(subcategories) == 0:
pywikibot.output(u'There are no subcategories in category %s' % self.cat.title())
else:
for subcategory in subcategories:
catlib.change_category(subcategory, self.cat, None, comment = self.editSummary, inPlace = self.inPlace)
# Deletes the category page
if self.cat.exists() and self.cat.isEmptyCategory():
if self.useSummaryForDeletion and self.editSummary:
reason = self.editSummary
else:
reason = i18n.twtranslate(self.site, 'category-was-disbanded')
talkPage = self.cat.toggleTalkPage()
try:
self.cat.delete(reason, not self.batchMode)
except pywikibot.NoUsername:
pywikibot.output(u'You\'re not setup sysop info, category will not delete.' % self.cat.site())
return
if (talkPage.exists()):
talkPage.delete(reason=reason, prompt=not self.batchMode)
class CategoryTidyRobot:
"""Script to help a human to tidy up a category by moving its articles into
subcategories
Specify the category name on the command line. The program will pick up the
page, and look for all subcategories and supercategories, and show them with
a number adjacent to them. It will then automatically loop over all pages
in the category. It will ask you to type the number of the appropriate
replacement, and perform the change robotically.
If you don't want to move the article to a subcategory or supercategory, but to
another category, you can use the 'j' (jump) command.
Typing 's' will leave the complete page unchanged.
Typing '?' will show you the first few bytes of the current page, helping
you to find out what the article is about and in which other categories it
currently is.
Important:
* this bot is written to work with the MonoBook skin, so make sure your bot
account uses this skin
"""
def __init__(self, catTitle, catDB):
self.catTitle = catTitle
self.catDB = catDB
self.site = pywikibot.getSite()
self.editSummary = i18n.twtranslate(self.site, 'category-changing')\
% {'oldcat':self.catTitle, 'newcat':u''}
def move_to_category(self, article, original_cat, current_cat):
'''
Given an article which is in category original_cat, ask the user if
it should be moved to one of original_cat's subcategories.
Recursively run through subcategories' subcategories.
NOTE: current_cat is only used for internal recursion. You should
always use current_cat = original_cat.
'''
pywikibot.output(u'')
# Show the title of the page where the link was found.
# Highlight the title in purple.
pywikibot.output(u'Treating page \03{lightpurple}%s\03{default}, currently in \03{lightpurple}%s\03{default}' % (article.title(), current_cat.title()))
# Determine a reasonable amount of context to print
try:
full_text = article.get(get_redirect = True)
except pywikibot.NoPage:
pywikibot.output(u'Page %s not found.' % article.title())
return
try:
contextLength = full_text.index('\n\n')
except ValueError: # substring not found
contextLength = 500
if full_text.startswith(u'[['): # probably an image
# Add extra paragraph.
contextLength = full_text.find('\n\n', contextLength+2)
if contextLength > 1000 or contextLength < 0:
contextLength = 500
print
pywikibot.output(full_text[:contextLength])
print
subcatlist = self.catDB.getSubcats(current_cat)
supercatlist = self.catDB.getSupercats(current_cat)
alternatives = u'\n'
if len(subcatlist) == 0:
alternatives += u'This category has no subcategories.\n\n'
if len(supercatlist) == 0:
alternatives += u'This category has no supercategories.\n\n'
# show subcategories as possible choices (with numbers)
for i in range(len(supercatlist)):
# layout: we don't expect a cat to have more than 10 supercats
alternatives += (u"u%d - Move up to %s\n" % (i, supercatlist[i].title()))
for i in range(len(subcatlist)):
# layout: we don't expect a cat to have more than 100 subcats
alternatives += (u"%2d - Move down to %s\n" % (i, subcatlist[i].title()))
alternatives += u" j - Jump to another category\n"
alternatives += u" s - Skip this article\n"
alternatives += u" r - Remove this category tag\n"
alternatives += u" l - list these options again\n"
alternatives += u" m - more context\n"
alternatives += (u"Enter - Save category as %s\n" % current_cat.title())
flag = False
longchoice = True
while not flag:
if longchoice:
longchoice = False
pywikibot.output(alternatives)
choice = pywikibot.input(u"Option:")
else:
choice = pywikibot.input(u"Option (#, [j]ump, [s]kip, [r]emove, [l]ist, [m]ore context, [RETURN]):")
if choice in ['s', 'S']:
flag = True
elif choice == '':
pywikibot.output(u'Saving category as %s' % current_cat.title())
if current_cat == original_cat:
print 'No changes necessary.'
else:
newcat = u'[[:%s|%s]]' % (current_cat.title(savetitle=True),
current_cat.title(withNamespace=False))
editsum = i18n.twtranslate(pywikibot.getSite(),
'category-replacing',
{'oldcat': original_cat.title(withNamespace=False),
'newcat': newcat})
if pywikibot.getSite().family.name == "commons":
if original_cat.title(withNamespace=False).startswith("Media needing categories as of"):
parts = original_cat.title().split()
catstring = u"{{Uncategorized|year=%s|month=%s|day=%s}}"%(parts[-1], parts[-2], parts[-3])
if catstring in article.get():
article.put(article.get().replace(catstring, u"[[%s]]"%current_cat.title(savetitle=True)), comment = editsum)
flag = True
if not flag:
catlib.change_category(article, original_cat, current_cat, comment = editsum)
flag = True
elif choice in ['j', 'J']:
newCatTitle = pywikibot.input(u'Please enter the category the article should be moved to:')
newCat = catlib.Category(pywikibot.getSite(), 'Category:' + newCatTitle)
# recurse into chosen category
self.move_to_category(article, original_cat, newCat)
flag = True
elif choice in ['r', 'R']:
# remove the category tag
catlib.change_category(article, original_cat, None, comment = self.editSummary)
flag = True
elif choice in ['l', 'L']:
longchoice = True
elif choice in ['m', 'M', '?']:
contextLength += 500
print
pywikibot.output(full_text[:contextLength])
print
# if categories possibly weren't visible, show them additionally
# (maybe this should always be shown?)
if len(full_text) > contextLength:
print ''
print 'Original categories: '
for cat in article.categories():
pywikibot.output(u'* %s' % cat.title())
elif choice[0] == 'u':
try:
choice=int(choice[1:])
except ValueError:
# user pressed an unknown command. Prompt him again.
continue
self.move_to_category(article, original_cat, supercatlist[choice])
flag = True
else:
try:
choice=int(choice)
except ValueError:
# user pressed an unknown command. Prompt him again.
continue
# recurse into subcategory
self.move_to_category(article, original_cat, subcatlist[choice])
flag = True
def run(self):
cat = catlib.Category(self.site, 'Category:' + self.catTitle)
articles = cat.articlesList(recurse = False)
if len(articles) == 0:
pywikibot.output(u'There are no articles in category ' + catTitle)
else:
preloadingGen = pagegenerators.PreloadingGenerator(iter(articles))
for article in preloadingGen:
pywikibot.output('')
pywikibot.output(u'=' * 67)
self.move_to_category(article, cat, cat)
class CategoryTreeRobot:
'''
Robot to create tree overviews of the category structure.
Parameters:
* catTitle - The category which will be the tree's root.
* catDB - A CategoryDatabase object
* maxDepth - The limit beyond which no subcategories will be listed.
This also guarantees that loops in the category structure
won't be a problem.
* filename - The textfile where the tree should be saved; None to print
the tree to stdout.
'''
def __init__(self, catTitle, catDB, filename = None, maxDepth = 10):
self.catTitle = catTitle
self.catDB = catDB
if filename and not os.path.isabs(filename):
filename = pywikibot.config.datafilepath(filename)
self.filename = filename
# TODO: make maxDepth changeable with a parameter or config file entry
self.maxDepth = maxDepth
self.site = pywikibot.getSite()
def treeview(self, cat, currentDepth = 0, parent = None):
'''
Returns a multi-line string which contains a tree view of all subcategories
of cat, up to level maxDepth. Recursively calls itself.
Parameters:
* cat - the Category of the node we're currently opening
* currentDepth - the current level in the tree (for recursion)
* parent - the Category of the category we're coming from
'''
result = u'#' * currentDepth
result += '[[:%s|%s]]' % (cat.title(), cat.title().split(':', 1)[1])
result += ' (%d)' % len(self.catDB.getArticles(cat))
# We will remove an element of this array, but will need the original array
# later, so we create a shallow copy with [:]
supercats = self.catDB.getSupercats(cat)[:]
# Find out which other cats are supercats of the current cat
try:
supercats.remove(parent)
except:
pass
if supercats != []:
supercat_names = []
for i in range(len(supercats)):
# create a list of wiki links to the supercategories
supercat_names.append('[[:%s|%s]]' % (supercats[i].title(), supercats[i].title().split(':', 1)[1]))
# print this list, separated with commas, using translations given in also_in_cats
result += ' ' + i18n.twtranslate(self.site, 'category-also-in',
{'alsocat': ', '.join(supercat_names)})
result += '\n'
if currentDepth < self.maxDepth:
for subcat in self.catDB.getSubcats(cat):
# recurse into subdirectories
result += self.treeview(subcat, currentDepth + 1, parent = cat)
else:
if self.catDB.getSubcats(cat) != []:
# show that there are more categories beyond the depth limit
result += '#' * (currentDepth + 1) + '[...]\n'
return result
def run(self):
"""Prints the multi-line string generated by treeview or saves it to a
file.
Parameters:
* catTitle - the title of the category which will be the tree's root
* maxDepth - the limit beyond which no subcategories will be listed
"""
cat = catlib.Category(self.site, 'Category:' + self.catTitle)
tree = self.treeview(cat)
if self.filename:
pywikibot.output(u'Saving results in %s' % self.filename)
import codecs
f = codecs.open(self.filename, 'a', 'utf-8')
f.write(tree)
f.close()
else:
pywikibot.output(tree, toStdout = True)
def main(*args):
global catDB
fromGiven = False
toGiven = False
batchMode = False
editSummary = ''
inPlace = False
overwrite = False
showImages = False
talkPages = False
recurse = False
withHistory = False
titleRegex = None
pagesonly = False
# This factory is responsible for processing command line arguments
# that are also used by other scripts and that determine on which pages
# to work on.
genFactory = pagegenerators.GeneratorFactory()
# The generator gives the pages that should be worked upon.
gen = None
# If this is set to true then the custom edit summary given for removing
# categories from articles will also be used as the deletion reason.
useSummaryForDeletion = True
catDB = CategoryDatabase()
action = None
sort_by_last_name = False
restore = False
create_pages = False
for arg in pywikibot.handleArgs(*args):
if arg == 'add':
action = 'add'
elif arg == 'remove':
action = 'remove'
elif arg == 'move':
action = 'move'
elif arg == 'tidy':
action = 'tidy'
elif arg == 'tree':
action = 'tree'
elif arg == 'listify':
action = 'listify'
elif arg == '-person':
sort_by_last_name = True
elif arg == '-rebuild':
catDB.rebuild()
elif arg.startswith('-from:'):
oldCatTitle = arg[len('-from:'):].replace('_', ' ')
fromGiven = True
elif arg.startswith('-to:'):
newCatTitle = arg[len('-to:'):].replace('_', ' ')
toGiven = True
elif arg == '-batch':
batchMode = True
elif arg == '-inplace':
inPlace = True
elif arg == '-nodelsum':
useSummaryForDeletion = False
elif arg == '-overwrite':
overwrite = True
elif arg == '-showimages':
showImages = True
elif arg.startswith('-summary:'):
editSummary = arg[len('-summary:'):]
elif arg.startswith('-match'):
if len(arg) == len('-match'):
titleRegex = pywikibot.input(
u'Which regular expression should affected objects match?')
else:
titleRegex = arg[len('-match:'):]
elif arg == '-talkpages':
talkPages = True
elif arg == '-recurse':
recurse = True
elif arg == '-pagesonly':
pagesonly = True
elif arg == '-create':
create_pages = True
elif arg == '-hist':
withHistory = True
else:
genFactory.handleArg(arg)
if action == 'add':
# Note that the add functionality is the only bot that actually
# uses the the generator factory. Every other bot creates its own
# generator exclusively from the command-line arguments that
# category.py understands.
if not gen:
gen = genFactory.getCombinedGenerator()
if not gen:
#default for backwords compatibility
genFactory.handleArg('-links')
# The preloading generator is responsible for downloading multiple
# pages from the wiki simultaneously.
gen = pagegenerators.PreloadingGenerator(
genFactory.getCombinedGenerator())
bot = AddCategory(gen, sort_by_last_name, create_pages, editSummary)
bot.run()
elif action == 'remove':
if (fromGiven == False):
oldCatTitle = pywikibot.input(
u'Please enter the name of the category that should be removed:')
bot = CategoryRemoveRobot(oldCatTitle, batchMode, editSummary,
useSummaryForDeletion, inPlace=inPlace,
pagesonly=pagesonly)
bot.run()
elif action == 'move':
if (fromGiven == False):
oldCatTitle = pywikibot.input(
u'Please enter the old name of the category:')
if (toGiven == False):
newCatTitle = pywikibot.input(
u'Please enter the new name of the category:')
bot = CategoryMoveRobot(oldCatTitle, newCatTitle, batchMode,
editSummary, inPlace, titleRegex=titleRegex,
withHistory=withHistory)
bot.run()
elif action == 'tidy':
catTitle = pywikibot.input(u'Which category do you want to tidy up?')
bot = CategoryTidyRobot(catTitle, catDB)
bot.run()
elif action == 'tree':
catTitle = pywikibot.input(
u'For which category do you want to create a tree view?')
filename = pywikibot.input(
u'Please enter the name of the file where the tree should be saved,\n'
u'or press enter to simply show the tree:')
bot = CategoryTreeRobot(catTitle, catDB, filename)
bot.run()
elif action == 'listify':
if (fromGiven == False):
oldCatTitle = pywikibot.input(
u'Please enter the name of the category to listify:')
if (toGiven == False):
newCatTitle = pywikibot.input(
u'Please enter the name of the list to create:')
bot = CategoryListifyRobot(oldCatTitle, newCatTitle, editSummary,
overwrite, showImages, subCats=True,
talkPages=talkPages, recurse=recurse)
bot.run()
else: