Conformément à la suggestion de bebraw, voici ce que j'ai fini par utiliser (dans un module séparé, bien sûr):
import re
class Subs(object):
"""
A container holding strings to be searched for and replaced in
replace_multi().
Holds little relation to the sandwich.
"""
def __init__(self, needles_and_replacements):
"""
Returns a new instance of the Subs class, given a dictionary holding
the keys to be searched for and the values to be used as replacements.
"""
self.lookup = needles_and_replacements
self.regex = re.compile('|'.join(map(re.escape,
needles_and_replacements)))
def replace_multi(string, subs):
"""
Replaces given items in string efficiently in a single-pass.
"string" should be the string to be searched.
"subs" can be either:
A.) a dictionary containing as its keys the items to be
searched for and as its values the items to be replaced.
or B.) a pre-compiled instance of the Subs class from this module
(which may have slightly better performance if this is
called often).
"""
if not isinstance(subs, Subs): # Assume dictionary if not our class.
subs = Subs(subs)
lookup = subs.lookup
return subs.regex.sub(lambda match: lookup[match.group(0)], string)
Exemple d'utilisation:
def escape(string):
"""
Returns the given string with ampersands, quotes and angle
brackets encoded.
"""
# Note that ampersands must be escaped first; the rest can be escaped in
# any order.
escape.subs = Subs({'<': '<', '>': '>', "'": ''', '"': '"'})
return replace_multi(string.replace('&', '&'), escape.subs)
Beaucoup mieux :). Merci pour l'aide.
Éditer
Peu importe, Mike Graham avait raison. Je l'ai comparé et le remplacement finit par être beaucoup plus lent.
Code:
from urllib2 import urlopen
import timeit
def escape1(string):
"""
Returns the given string with ampersands, quotes and angle
brackets encoded.
"""
return string.replace('&', '&').replace('<', '<').replace('>', '>').replace("'", ''').replace('"', '"')
def escape2(string):
"""
Returns the given string with ampersands, quotes and angle
brackets encoded.
"""
# Note that ampersands must be escaped first; the rest can be escaped in
# any order.
escape2.subs = Subs({'<': '<', '>': '>', "'": ''', '"': '"'})
return replace_multi(string.replace('&', '&'), escape2.subs)
# An example test on the stackoverflow homepage.
request = urlopen('http://stackoverflow.com')
test_string = request.read()
request.close()
test1 = timeit.Timer('escape1(test_string)',
setup='from __main__ import escape1, test_string')
test2 = timeit.Timer('escape2(test_string)',
setup='from __main__ import escape2, test_string')
print 'multi-pass:', test1.timeit(2000)
print 'single-pass:', test2.timeit(2000)
Production:
multi-pass: 15.9897229671
single-pass: 66.5422530174
Tellement pour ça.