Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for more language specyfic characters, escape ' inside authors.py #35

Open
orlra opened this issue Feb 27, 2017 · 0 comments
Open

Comments

@orlra
Copy link

orlra commented Feb 27, 2017

code triggered when you search by author:
https://github.com/lucastheis/django-publications/blob/develop/publications/views/author.py#L32

    surname = names[-1]
    surname = surname.replace(u'ä', u'%%')
    surname = surname.replace(u'ae', u'%%')
    surname = surname.replace(u'ö', u'%%')
    surname = surname.replace(u'oe', u'%%')
    surname = surname.replace(u'ü', u'%%')
    surname = surname.replace(u'ue', u'%%')
    surname = surname.replace(u'ß', u'%%')
    surname = surname.replace(u'ss', u'%%')

    query_str = u'SELECT * FROM {table} ' \
                'WHERE lower({table}.authors) LIKE lower(\'%%{surname}%%\') ' \
'ORDER BY {table}.year DESC, {table}.month DESC, {table}.id DESC'

this errors when I ask for any string with quotation mark.
EXAMPLE
publications/s.+zheng/CONCAT('whs(', ')SQLi')/

File "/data/www/venv/local/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
ProgrammingError: syntax error at or near "whs"
LINE 1: ...s_publication.authors) LIKE lower('%zheng/CONCAT('whs(',')SQ...

and this is very bad.
next error found few lines later:
publications/d.+agard/+ADw-whscheck+AD4-/

 File "/data/www/venv/local/lib/python2.7/site-packages/publications/views/person.py", line 23, in person
    author = author[:off] + author[off].upper() + author[off + 1:]
IndexError: string index out of range

from what I see in code you support few special chars, but most is ignored.

from what I see on DB with around 4k of publications, theres a lot more 'special characters' in database than you support:

char | count of this char through all rows in author field.
è     |      1
 <U+00AD>     |      1
 ě     |      1
 4     |      1
 ć     |      1
 ı     |      1
 ’     |      1
 à     |      1
 ï     |      1
 ý     |      1
 ã     |      1
 ň     |      1
 (     |      2
 ú     |      2
 ́      |      2
 1     |      2
 0     |      3
 ä     |      3
 č     |      4
 ñ     |      4
       |      4
 ř     |      4
 š     |      6
 é     |      9
 &     |     10
 í     |     10
 ö     |     11
 ü     |     13
 '     |     14
 ó     |     17
 á     |     29

I have no good solution here. I thought of using slugify from django.utils.text here as its safest way, but it would need to be another column in table or annotate (this will be slow) and could will mess up guys with similar surnames. Discussion Open:

code used to get characters inside a field postgre:

WITH RECURSIVE itemChars(aChar, remain) AS (
   SELECT LEFT(lower(authors),1), RIGHT(lower(authors), LENGTH(authors)-1) 
      FROM publications_publication WHERE LENGTH(authors)>0
   UNION ALL
   SELECT LEFT(remain,1), RIGHT(remain, LENGTH(remain)-1) FROM itemChars
      WHERE LENGTH(remain)>0
)
SELECT aChar, COUNT(*) as amount FROM itemChars
GROUP BY aChar ORDER BY amount;

Django 1.8.17
django-publications 0.6.2/0.6.3/develop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant