Skip to content

Commit 8b4871b

Browse files
author
bwarsaw
committed
PEP 292 classes Template and SafeTemplate are added to the string module.
This patch includes test cases and documentation updates, as well as NEWS file updates. This patch also updates the sre modules so that they don't import the string module, breaking direct circular imports. git-svn-id: http://svn.python.org/projects/python/trunk@37121 6015fed2-1504-0410-9fe1-9d1591cc4771
1 parent 22efef9 commit 8b4871b

File tree

7 files changed

+329
-93
lines changed

7 files changed

+329
-93
lines changed

Doc/lib/libstring.tex

+122-28
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,23 @@ \section{\module{string} ---
44
\declaremodule{standard}{string}
55
\modulesynopsis{Common string operations.}
66

7+
The \module{string} package contains a number of useful constants and classes,
8+
as well as some deprecated legacy functions that are also available as methods
9+
on strings. See the module \refmodule{re}\refstmodindex{re} for string
10+
functions based on regular expressions.
711

8-
This module defines some constants useful for checking character
9-
classes and some useful string functions. See the module
10-
\refmodule{re}\refstmodindex{re} for string functions based on regular
11-
expressions.
12+
In general, all of these objects are exposed directly in the \module{string}
13+
package so users need only import the \module{string} package to begin using
14+
these constants, classes, and functions.
15+
16+
\begin{notice}
17+
Starting with Python 2.4, the traditional \module{string} module was turned
18+
into a package, however backward compatibility with existing code has been
19+
retained. Code using the \module{string} module that worked prior to Python
20+
2.4 should continue to work unchanged.
21+
\end{notice}
22+
23+
\subsection{String constants}
1224

1325
The constants defined in this module are:
1426

@@ -86,11 +98,113 @@ \section{\module{string} ---
8698
is undefined.
8799
\end{datadesc}
88100

101+
\subsection{Template strings}
102+
103+
Templates are Unicode strings that can be used to provide string substitutions
104+
as described in \pep{292}. There is a \class{Template} class that is a
105+
subclass of \class{unicode}, overriding the default \method{__mod__()} method.
106+
Instead of the normal \samp{\%}-based substitutions, Template strings support
107+
\samp{\$}-based substitutions, using the following rules:
108+
109+
\begin{itemize}
110+
\item \samp{\$\$} is an escape; it is replaced with a single \samp{\$}.
111+
112+
\item \samp{\$identifier} names a substitution placeholder matching a mapping
113+
key of "identifier". By default, "identifier" must spell a Python
114+
identifier. The first non-identifier character after the \samp{\$}
115+
character terminates this placeholder specification.
116+
117+
\item \samp{\$\{identifier\}} is equivalent to \samp{\$identifier}. It is
118+
required when valid identifier characters follow the placeholder but are
119+
not part of the placeholder, e.g. "\$\{noun\}ification".
120+
\end{itemize}
121+
122+
Any other appearance of \samp{\$} in the string will result in a
123+
\exception{ValueError} being raised.
124+
125+
Template strings are used just like normal strings, in that the modulus
126+
operator is used to interpolate a dictionary of values into a Template string,
127+
e.g.:
128+
129+
\begin{verbatim}
130+
>>> from string import Template
131+
>>> s = Template('$who likes $what')
132+
>>> print s % dict(who='tim', what='kung pao')
133+
tim likes kung pao
134+
>>> Template('Give $who $100') % dict(who='tim')
135+
Traceback (most recent call last):
136+
[...]
137+
ValueError: Invalid placeholder at index 10
138+
\end{verbatim}
139+
140+
There is also a \class{SafeTemplate} class, derived from \class{Template}
141+
which acts the same as \class{Template}, except that if placeholders are
142+
missing in the interpolation dictionary, no \exception{KeyError} will be
143+
raised. Instead the original placeholder (with or without the braces, as
144+
appropriate) will be used:
145+
146+
\begin{verbatim}
147+
>>> from string import SafeTemplate
148+
>>> s = SafeTemplate('$who likes $what for ${meal}')
149+
>>> print s % dict(who='tim')
150+
tim likes $what for ${meal}
151+
\end{verbatim}
152+
153+
The values in the mapping will automatically be converted to Unicode strings,
154+
using the built-in \function{unicode()} function, which will be called without
155+
optional arguments \var{encoding} or \var{errors}.
156+
157+
Advanced usage: you can derive subclasses of \class{Template} or
158+
\class{SafeTemplate} to use application-specific placeholder rules. To do
159+
this, you override the class attribute \member{pattern}; the value must be a
160+
compiled regular expression object with four named capturing groups. The
161+
capturing groups correspond to the rules given above, along with the invalid
162+
placeholder rule:
163+
164+
\begin{itemize}
165+
\item \var{escaped} -- This group matches the escape sequence, i.e. \samp{\$\$}
166+
in the default pattern.
167+
\item \var{named} -- This group matches the unbraced placeholder name; it
168+
should not include the \samp{\$} in capturing group.
169+
\item \var{braced} -- This group matches the brace delimited placeholder name;
170+
it should not include either the \samp{\$} or braces in the capturing
171+
group.
172+
\item \var{bogus} -- This group matches any other \samp{\$}. It usually just
173+
matches a single \samp{\$} and should appear last.
174+
\end{itemize}
175+
176+
\subsection{String functions}
177+
178+
The following functions are available to operate on string and Unicode
179+
objects. They are not available as string methods.
180+
181+
\begin{funcdesc}{capwords}{s}
182+
Split the argument into words using \function{split()}, capitalize
183+
each word using \function{capitalize()}, and join the capitalized
184+
words using \function{join()}. Note that this replaces runs of
185+
whitespace characters by a single space, and removes leading and
186+
trailing whitespace.
187+
\end{funcdesc}
188+
189+
\begin{funcdesc}{maketrans}{from, to}
190+
Return a translation table suitable for passing to
191+
\function{translate()} or \function{regex.compile()}, that will map
192+
each character in \var{from} into the character at the same position
193+
in \var{to}; \var{from} and \var{to} must have the same length.
194+
195+
\warning{Don't use strings derived from \constant{lowercase}
196+
and \constant{uppercase} as arguments; in some locales, these don't have
197+
the same length. For case conversions, always use
198+
\function{lower()} and \function{upper()}.}
199+
\end{funcdesc}
89200

90-
Many of the functions provided by this module are also defined as
91-
methods of string and Unicode objects; see ``String Methods'' (section
92-
\ref{string-methods}) for more information on those.
93-
The functions defined in this module are:
201+
\subsection{Deprecated string functions}
202+
203+
The following list of functions are also defined as methods of string and
204+
Unicode objects; see ``String Methods'' (section
205+
\ref{string-methods}) for more information on those. You should consider
206+
these functions as deprecated, although they will not be removed until Python
207+
3.0. The functions defined in this module are:
94208

95209
\begin{funcdesc}{atof}{s}
96210
\deprecated{2.0}{Use the \function{float()} built-in function.}
@@ -138,14 +252,6 @@ \section{\module{string} ---
138252
Return a copy of \var{word} with only its first character capitalized.
139253
\end{funcdesc}
140254

141-
\begin{funcdesc}{capwords}{s}
142-
Split the argument into words using \function{split()}, capitalize
143-
each word using \function{capitalize()}, and join the capitalized
144-
words using \function{join()}. Note that this replaces runs of
145-
whitespace characters by a single space, and removes leading and
146-
trailing whitespace.
147-
\end{funcdesc}
148-
149255
\begin{funcdesc}{expandtabs}{s\optional{, tabsize}}
150256
Expand tabs in a string, i.e.\ replace them by one or more spaces,
151257
depending on the current column and the given tab size. The column
@@ -188,18 +294,6 @@ \section{\module{string} ---
188294
lower case.
189295
\end{funcdesc}
190296

191-
\begin{funcdesc}{maketrans}{from, to}
192-
Return a translation table suitable for passing to
193-
\function{translate()} or \function{regex.compile()}, that will map
194-
each character in \var{from} into the character at the same position
195-
in \var{to}; \var{from} and \var{to} must have the same length.
196-
197-
\warning{Don't use strings derived from \constant{lowercase}
198-
and \constant{uppercase} as arguments; in some locales, these don't have
199-
the same length. For case conversions, always use
200-
\function{lower()} and \function{upper()}.}
201-
\end{funcdesc}
202-
203297
\begin{funcdesc}{split}{s\optional{, sep\optional{, maxsplit}}}
204298
Return a list of the words of the string \var{s}. If the optional
205299
second argument \var{sep} is absent or \code{None}, the words are

Lib/sre.py

+1-8
Original file line numberDiff line numberDiff line change
@@ -105,9 +105,6 @@
105105

106106
__version__ = "2.2.1"
107107

108-
# this module works under 1.5.2 and later. don't use string methods
109-
import string
110-
111108
# flags
112109
I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
113110
L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
@@ -201,7 +198,7 @@ def escape(pattern):
201198
s[i] = "\\000"
202199
else:
203200
s[i] = "\\" + c
204-
return _join(s, pattern)
201+
return pattern[:0].join(s)
205202

206203
# --------------------------------------------------------------------
207204
# internals
@@ -213,10 +210,6 @@ def escape(pattern):
213210

214211
_MAXCACHE = 100
215212

216-
def _join(seq, sep):
217-
# internal: join into string having the same type as sep
218-
return string.join(seq, sep[:0])
219-
220213
def _compile(*key):
221214
# internal: compile pattern
222215
cachekey = (type(key[0]),) + key

Lib/sre_constants.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -217,12 +217,11 @@ def makedict(list):
217217
SRE_INFO_CHARSET = 4 # pattern starts with character from given set
218218

219219
if __name__ == "__main__":
220-
import string
221220
def dump(f, d, prefix):
222221
items = d.items()
223222
items.sort(key=lambda a: a[1])
224223
for k, v in items:
225-
f.write("#define %s_%s %s\n" % (prefix, string.upper(k), v))
224+
f.write("#define %s_%s %s\n" % (prefix, k.upper(), v))
226225
f = open("sre_constants.h", "w")
227226
f.write("""\
228227
/*

Lib/sre_parse.py

+13-21
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@
1212

1313
# XXX: show string offset and offending character for all errors
1414

15-
# this module works under 1.5.2 and later. don't use string methods
16-
import string, sys
15+
import sys
1716

1817
from sre_constants import *
1918

@@ -63,13 +62,6 @@
6362
"u": SRE_FLAG_UNICODE,
6463
}
6564

66-
# figure out best way to convert hex/octal numbers to integers
67-
try:
68-
int("10", 8)
69-
atoi = int # 2.0 and later
70-
except TypeError:
71-
atoi = string.atoi # 1.5.2
72-
7365
class Pattern:
7466
# master pattern object. keeps track of global attributes
7567
def __init__(self):
@@ -233,7 +225,7 @@ def isname(name):
233225
def _group(escape, groups):
234226
# check if the escape string represents a valid group
235227
try:
236-
gid = atoi(escape[1:])
228+
gid = int(escape[1:])
237229
if gid and gid < groups:
238230
return gid
239231
except ValueError:
@@ -256,13 +248,13 @@ def _class_escape(source, escape):
256248
escape = escape[2:]
257249
if len(escape) != 2:
258250
raise error, "bogus escape: %s" % repr("\\" + escape)
259-
return LITERAL, atoi(escape, 16) & 0xff
251+
return LITERAL, int(escape, 16) & 0xff
260252
elif escape[1:2] in OCTDIGITS:
261253
# octal escape (up to three digits)
262254
while source.next in OCTDIGITS and len(escape) < 5:
263255
escape = escape + source.get()
264256
escape = escape[1:]
265-
return LITERAL, atoi(escape, 8) & 0xff
257+
return LITERAL, int(escape, 8) & 0xff
266258
if len(escape) == 2:
267259
return LITERAL, ord(escape[1])
268260
except ValueError:
@@ -284,12 +276,12 @@ def _escape(source, escape, state):
284276
escape = escape + source.get()
285277
if len(escape) != 4:
286278
raise ValueError
287-
return LITERAL, atoi(escape[2:], 16) & 0xff
279+
return LITERAL, int(escape[2:], 16) & 0xff
288280
elif escape[1:2] == "0":
289281
# octal escape
290282
while source.next in OCTDIGITS and len(escape) < 4:
291283
escape = escape + source.get()
292-
return LITERAL, atoi(escape[1:], 8) & 0xff
284+
return LITERAL, int(escape[1:], 8) & 0xff
293285
elif escape[1:2] in DIGITS:
294286
# octal escape *or* decimal group reference (sigh)
295287
if source.next in DIGITS:
@@ -298,7 +290,7 @@ def _escape(source, escape, state):
298290
source.next in OCTDIGITS):
299291
# got three octal digits; this is an octal escape
300292
escape = escape + source.get()
301-
return LITERAL, atoi(escape[1:], 8) & 0xff
293+
return LITERAL, int(escape[1:], 8) & 0xff
302294
# got at least one decimal digit; this is a group reference
303295
group = _group(escape, state.groups)
304296
if group:
@@ -503,9 +495,9 @@ def _parse(source, state):
503495
source.seek(here)
504496
continue
505497
if lo:
506-
min = atoi(lo)
498+
min = int(lo)
507499
if hi:
508-
max = atoi(hi)
500+
max = int(hi)
509501
if max < min:
510502
raise error, "bad repeat interval"
511503
else:
@@ -617,7 +609,7 @@ def _parse(source, state):
617609
raise error, "unknown group name"
618610
else:
619611
try:
620-
condgroup = atoi(condname)
612+
condgroup = int(condname)
621613
except ValueError:
622614
raise error, "bad character in group name"
623615
else:
@@ -730,7 +722,7 @@ def literal(literal, p=p, pappend=a):
730722
if not name:
731723
raise error, "bad group name"
732724
try:
733-
index = atoi(name)
725+
index = int(name)
734726
except ValueError:
735727
if not isname(name):
736728
raise error, "bad character in group name"
@@ -754,7 +746,7 @@ def literal(literal, p=p, pappend=a):
754746
break
755747
if not code:
756748
this = this[1:]
757-
code = LITERAL, makechar(atoi(this[-6:], 8) & 0xff)
749+
code = LITERAL, makechar(int(this[-6:], 8) & 0xff)
758750
if code[0] is LITERAL:
759751
literal(code[1])
760752
else:
@@ -793,4 +785,4 @@ def expand_template(template, match):
793785
raise IndexError
794786
except IndexError:
795787
raise error, "empty group"
796-
return string.join(literals, sep)
788+
return sep.join(literals)

0 commit comments

Comments
 (0)