source: OpenRLabs-Git/deploy/rlabs-docker/web2py-rlabs/gluon/contrib/pyuca/README.markmin

main
Last change on this file was 42bd667, checked in by David Fuertes <dfuertes@…>, 4 years ago

Historial Limpio

  • Property mode set to 100755
File size: 1.6 KB
Line 
1# pyuca: Python Unicode Collation Algorithm implementation
2(http://jtauber.com/blog/2006/01/27/python_unicode_collation_algorithm/)
3
4This is my preliminary attempt at a Python implementation of the
5[Unicode Collation Algorithm (UCA)](http://unicode.org/reports/tr10/).
6I originally posted it to my blog in 2006 but it seems to get enough
7usage it really belongs here (and in PyPI).
8
9What do you use it for? In short, sorting non-English strings properly.
10
11The core of the algorithm involves multi-level comparison. For example,
12``café`` comes before ``caff`` because at the primary level, the accent
13is ignored and the first word is treated as if it were ``cafe``.
14The secondary level (which considers accents) only applies then to words
15that are equivalent at the primary level.
16
17The Unicode Collation Algorithm and pyuca also support contraction and
18expansion. **Contraction** is where multiple letters are treated as a
19single unit. In Spanish, ``ch`` is treated as a letter coming between
20``c`` and ``d`` so that, for example, words beginning ``ch`` should
21sort after all other words beginnings with ``c``. **Expansion** is where
22a single letter is treated as though it were multiple letters. In German,
23``ä`` is sorted as if it were ``ae``, i.e. after ``ad`` but before ``af``.
24
25## Here is how to use the ``pyuca`` module:
26``
27git clone https://github.com/jtauber/pyuca.git
28cd pyuca
29pip install pyuca
30``
31
32**Usage example:**
33``
34    from pyuca import Collator
35    c = Collator("allkeys.txt")
36   
37    sorted_words = sorted(words, key=c.sort_key)
38``
39
40``allkeys.txt`` (1 MB) is available at
41
42http://www.unicode.org/Public/UCA/latest/allkeys.txt
43
Note: See TracBrowser for help on using the repository browser.