Skip to content

Commit 223444a

Browse files
authored
Merge pull request #1 from dropbox/master
Merge upstream
2 parents 4754b49 + ac09074 commit 223444a

20 files changed

+804
-153
lines changed

README.rst

Lines changed: 63 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,31 @@
1-
================================
2-
Project is currently unsupported
3-
================================
1+
========================================================
2+
PyHive project has been donated to Apache Kyuubi
3+
========================================================
44

5+
You can follow it's development and report any issues you are experiencing here: https://github.com/apache/kyuubi/tree/master/python/pyhive
56

67

78

8-
.. image:: https://travis-ci.org/dropbox/PyHive.svg?branch=master
9-
:target: https://travis-ci.org/dropbox/PyHive
10-
.. image:: https://img.shields.io/codecov/c/github/dropbox/PyHive.svg
9+
Legacy notes / instructions
10+
===========================
1111

12-
======
1312
PyHive
14-
======
13+
**********
14+
1515

1616
PyHive is a collection of Python `DB-API <http://www.python.org/dev/peps/pep-0249/>`_ and
17-
`SQLAlchemy <http://www.sqlalchemy.org/>`_ interfaces for `Presto <http://prestodb.io/>`_ and
18-
`Hive <http://hive.apache.org/>`_.
17+
`SQLAlchemy <http://www.sqlalchemy.org/>`_ interfaces for `Presto <http://prestodb.io/>`_ ,
18+
`Hive <http://hive.apache.org/>`_ and `Trino <https://trino.io/>`_.
1919

2020
Usage
21-
=====
21+
**********
2222

2323
DB-API
2424
------
2525
.. code-block:: python
2626
2727
from pyhive import presto # or import hive or import trino
28-
cursor = presto.connect('localhost').cursor()
28+
cursor = presto.connect('localhost').cursor() # or use hive.connect or use trino.connect
2929
cursor.execute('SELECT * FROM my_awesome_data LIMIT 10')
3030
print cursor.fetchone()
3131
print cursor.fetchall()
@@ -61,7 +61,7 @@ In Python 3.7 `async` became a keyword; you can use `async_` instead:
6161
6262
SQLAlchemy
6363
----------
64-
First install this package to register it with SQLAlchemy (see ``setup.py``).
64+
First install this package to register it with SQLAlchemy, see ``entry_points`` in ``setup.py``.
6565

6666
.. code-block:: python
6767
@@ -71,9 +71,11 @@ First install this package to register it with SQLAlchemy (see ``setup.py``).
7171
# Presto
7272
engine = create_engine('presto://localhost:8080/hive/default')
7373
# Trino
74-
engine = create_engine('trino://localhost:8080/hive/default')
74+
engine = create_engine('trino+pyhive://localhost:8080/hive/default')
7575
# Hive
7676
engine = create_engine('hive://localhost:10000/default')
77+
78+
# SQLAlchemy < 2.0
7779
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
7880
print select([func.count('*')], from_obj=logs).scalar()
7981
@@ -82,6 +84,20 @@ First install this package to register it with SQLAlchemy (see ``setup.py``).
8284
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
8385
print select([func.count('*')], from_obj=logs).scalar()
8486
87+
# SQLAlchemy >= 2.0
88+
metadata_obj = MetaData()
89+
books = Table("books", metadata_obj, Column("id", Integer), Column("title", String), Column("primary_author", String))
90+
metadata_obj.create_all(engine)
91+
inspector = inspect(engine)
92+
inspector.get_columns('books')
93+
94+
with engine.connect() as con:
95+
data = [{ "id": 1, "title": "The Hobbit", "primary_author": "Tolkien" },
96+
{ "id": 2, "title": "The Silmarillion", "primary_author": "Tolkien" }]
97+
con.execute(books.insert(), data[0])
98+
result = con.execute(text("select * from books"))
99+
print(result.fetchall())
100+
85101
Note: query generation functionality is not exhaustive or fully tested, but there should be no
86102
problem with raw SQL.
87103

@@ -101,7 +117,7 @@ Passing session configuration
101117
'session_props': {'query_max_run_time': '1234m'}}
102118
)
103119
create_engine(
104-
'trino://user@host:443/hive',
120+
'trino+pyhive://user@host:443/hive',
105121
connect_args={'protocol': 'https',
106122
'session_props': {'query_max_run_time': '1234m'}}
107123
)
@@ -116,27 +132,30 @@ Passing session configuration
116132
)
117133
118134
Requirements
119-
============
135+
************
120136

121137
Install using
122138

123-
- ``pip install 'pyhive[hive]'`` for the Hive interface and
124-
- ``pip install 'pyhive[presto]'`` for the Presto interface.
139+
- ``pip install 'pyhive[hive]'`` or ``pip install 'pyhive[hive_pure_sasl]'`` for the Hive interface
140+
- ``pip install 'pyhive[presto]'`` for the Presto interface
125141
- ``pip install 'pyhive[trino]'`` for the Trino interface
126142

143+
Note: ``'pyhive[hive]'`` extras uses `sasl <https://pypi.org/project/sasl/>`_ that doesn't support Python 3.11, See `github issue <https://github.com/cloudera/python-sasl/issues/30>`_.
144+
Hence PyHive also supports `pure-sasl <https://pypi.org/project/pure-sasl/>`_ via additional extras ``'pyhive[hive_pure_sasl]'`` which support Python 3.11.
145+
127146
PyHive works with
128147

129148
- Python 2.7 / Python 3
130-
- For Presto: Presto install
131-
- For Trino: Trino install
149+
- For Presto: `Presto installation <https://prestodb.io/docs/current/installation.html>`_
150+
- For Trino: `Trino installation <https://trino.io/docs/current/installation.html>`_
132151
- For Hive: `HiveServer2 <https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2>`_ daemon
133152

134153
Changelog
135-
=========
154+
*********
136155
See https://github.com/dropbox/PyHive/releases.
137156

138157
Contributing
139-
============
158+
************
140159
- Please fill out the Dropbox Contributor License Agreement at https://opensource.dropbox.com/cla/ and note this in your pull request.
141160
- Changes must come with tests, with the exception of trivial things like fixing comments. See .travis.yml for the test environment setup.
142161
- Notes on project scope:
@@ -146,8 +165,28 @@ Contributing
146165
- We prefer having a small number of generic features over a large number of specialized, inflexible features.
147166
For example, the Presto code takes an arbitrary ``requests_session`` argument for customizing HTTP calls, as opposed to having a separate parameter/branch for each ``requests`` option.
148167

168+
Tips for test environment setup
169+
****************************************
170+
You can setup test environment by following ``.travis.yaml`` in this repository. It uses `Cloudera's CDH 5 <https://docs.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_download_510.html>`_ which requires username and password for download.
171+
It may not be feasible for everyone to get those credentials. Hence below are alternative instructions to setup test environment.
172+
173+
You can clone `this repository <https://github.com/big-data-europe/docker-hive/blob/master/docker-compose.yml>`_ which has Docker Compose setup for Presto and Hive.
174+
You can add below lines to its docker-compose.yaml to start Trino in same environment::
175+
176+
trino:
177+
image: trinodb/trino:351
178+
ports:
179+
- "18080:18080"
180+
volumes:
181+
- ./trino:/etc/trino
182+
183+
Note: ``./trino`` for docker volume defined above is `trino config from PyHive repository <https://github.com/dropbox/PyHive/tree/master/scripts/travis-conf/trino>`_
184+
185+
Then run::
186+
docker-compose up -d
187+
149188
Testing
150-
=======
189+
*******
151190
.. image:: https://travis-ci.org/dropbox/PyHive.svg
152191
:target: https://travis-ci.org/dropbox/PyHive
153192
.. image:: http://codecov.io/github/dropbox/PyHive/coverage.svg?branch=master
@@ -166,7 +205,7 @@ WARNING: This drops/creates tables named ``one_row``, ``one_row_complex``, and `
166205
database called ``pyhive_test_database``.
167206

168207
Updating TCLIService
169-
====================
208+
********************
170209

171210
The TCLIService module is autogenerated using a ``TCLIService.thrift`` file. To update it, the
172211
``generate.py`` file can be used: ``python generate.py <TCLIServiceURL>``. When left blank, the

dev_requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ pytest-timeout==1.2.0
1212
requests>=1.0.0
1313
requests_kerberos>=0.12.0
1414
sasl>=0.2.1
15+
pure-sasl>=0.6.2
16+
kerberos>=1.3.0
1517
thrift>=0.10.0
1618
#thrift_sasl>=0.1.0
1719
git+https://github.com/cloudera/thrift_sasl # Using master branch in order to get Python 3 SASL patches

pyhive/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
from __future__ import absolute_import
22
from __future__ import unicode_literals
3-
__version__ = '0.6.3'
3+
__version__ = '0.7.0'

pyhive/common.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,11 @@
1818
from future.utils import with_metaclass
1919
from itertools import islice
2020

21+
try:
22+
from collections.abc import Iterable
23+
except ImportError:
24+
from collections import Iterable
25+
2126

2227
class DBAPICursor(with_metaclass(abc.ABCMeta, object)):
2328
"""Base class for some common DB-API logic"""
@@ -245,7 +250,7 @@ def escape_item(self, item):
245250
return self.escape_number(item)
246251
elif isinstance(item, basestring):
247252
return self.escape_string(item)
248-
elif isinstance(item, collections.Iterable):
253+
elif isinstance(item, Iterable):
249254
return self.escape_sequence(item)
250255
elif isinstance(item, datetime.datetime):
251256
return self.escape_datetime(item, self._DATETIME_FORMAT)

pyhive/hive.py

Lines changed: 41 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,45 @@
5454
}
5555

5656

57+
def get_sasl_client(host, sasl_auth, service=None, username=None, password=None):
58+
import sasl
59+
sasl_client = sasl.Client()
60+
sasl_client.setAttr('host', host)
61+
62+
if sasl_auth == 'GSSAPI':
63+
sasl_client.setAttr('service', service)
64+
elif sasl_auth == 'PLAIN':
65+
sasl_client.setAttr('username', username)
66+
sasl_client.setAttr('password', password)
67+
else:
68+
raise ValueError("sasl_auth only supports GSSAPI and PLAIN")
69+
70+
sasl_client.init()
71+
return sasl_client
72+
73+
74+
def get_pure_sasl_client(host, sasl_auth, service=None, username=None, password=None):
75+
from pyhive.sasl_compat import PureSASLClient
76+
77+
if sasl_auth == 'GSSAPI':
78+
sasl_kwargs = {'service': service}
79+
elif sasl_auth == 'PLAIN':
80+
sasl_kwargs = {'username': username, 'password': password}
81+
else:
82+
raise ValueError("sasl_auth only supports GSSAPI and PLAIN")
83+
84+
return PureSASLClient(host=host, **sasl_kwargs)
85+
86+
87+
def get_installed_sasl(host, sasl_auth, service=None, username=None, password=None):
88+
try:
89+
return get_sasl_client(host=host, sasl_auth=sasl_auth, service=service, username=username, password=password)
90+
# The sasl library is available
91+
except ImportError:
92+
# Fallback to pure-sasl library
93+
return get_pure_sasl_client(host=host, sasl_auth=sasl_auth, service=service, username=username, password=password)
94+
95+
5796
def _parse_timestamp(value):
5897
if value:
5998
match = _TIMESTAMP_PATTERN.match(value)
@@ -232,7 +271,6 @@ def __init__(
232271
self._transport = thrift.transport.TTransport.TBufferedTransport(socket)
233272
elif auth in ('LDAP', 'KERBEROS', 'NONE', 'CUSTOM'):
234273
# Defer import so package dependency is optional
235-
import sasl
236274
import thrift_sasl
237275

238276
if auth == 'KERBEROS':
@@ -243,20 +281,8 @@ def __init__(
243281
if password is None:
244282
# Password doesn't matter in NONE mode, just needs to be nonempty.
245283
password = 'x'
246-
247-
def sasl_factory():
248-
sasl_client = sasl.Client()
249-
sasl_client.setAttr('host', host)
250-
if sasl_auth == 'GSSAPI':
251-
sasl_client.setAttr('service', kerberos_service_name)
252-
elif sasl_auth == 'PLAIN':
253-
sasl_client.setAttr('username', username)
254-
sasl_client.setAttr('password', password)
255-
else:
256-
raise AssertionError
257-
sasl_client.init()
258-
return sasl_client
259-
self._transport = thrift_sasl.TSaslClientTransport(sasl_factory, sasl_auth, socket)
284+
285+
self._transport = thrift_sasl.TSaslClientTransport(lambda: get_installed_sasl(host=host, sasl_auth=sasl_auth, service=kerberos_service_name, username=username, password=password), sasl_auth, socket)
260286
else:
261287
# All HS2 config options:
262288
# https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-Configuration

pyhive/presto.py

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
from __future__ import unicode_literals
1010

1111
from builtins import object
12+
from decimal import Decimal
13+
1214
from pyhive import common
1315
from pyhive.common import DBAPITypeObject
1416
# Make all exceptions visible in this module per DB-API
@@ -34,6 +36,11 @@
3436

3537
_logger = logging.getLogger(__name__)
3638

39+
TYPES_CONVERTER = {
40+
"decimal": Decimal,
41+
# As of Presto 0.69, binary data is returned as the varbinary type in base64 format
42+
"varbinary": base64.b64decode
43+
}
3744

3845
class PrestoParamEscaper(common.ParamEscaper):
3946
def escape_datetime(self, item, format):
@@ -307,14 +314,13 @@ def _fetch_more(self):
307314
"""Fetch the next URI and update state"""
308315
self._process_response(self._requests_session.get(self._nextUri, **self._requests_kwargs))
309316

310-
def _decode_binary(self, rows):
311-
# As of Presto 0.69, binary data is returned as the varbinary type in base64 format
312-
# This function decodes base64 data in place
317+
def _process_data(self, rows):
313318
for i, col in enumerate(self.description):
314-
if col[1] == 'varbinary':
319+
col_type = col[1].split("(")[0].lower()
320+
if col_type in TYPES_CONVERTER:
315321
for row in rows:
316322
if row[i] is not None:
317-
row[i] = base64.b64decode(row[i])
323+
row[i] = TYPES_CONVERTER[col_type](row[i])
318324

319325
def _process_response(self, response):
320326
"""Given the JSON response from Presto's REST API, update the internal state with the next
@@ -341,7 +347,7 @@ def _process_response(self, response):
341347
if 'data' in response_json:
342348
assert self._columns
343349
new_data = response_json['data']
344-
self._decode_binary(new_data)
350+
self._process_data(new_data)
345351
self._data += map(tuple, new_data)
346352
if 'nextUri' not in response_json:
347353
self._state = self._STATE_FINISHED

pyhive/sasl_compat.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Original source of this file is https://github.com/cloudera/impyla/blob/master/impala/sasl_compat.py
2+
# which uses Apache-2.0 license as of 21 May 2023.
3+
# This code was added to Impyla in 2016 as a compatibility layer to allow use of either python-sasl or pure-sasl
4+
# via PR https://github.com/cloudera/impyla/pull/179
5+
# Even though thrift_sasl lists pure-sasl as dependency here https://github.com/cloudera/thrift_sasl/blob/master/setup.py#L34
6+
# but it still calls functions native to python-sasl in this file https://github.com/cloudera/thrift_sasl/blob/master/thrift_sasl/__init__.py#L82
7+
# Hence this code is required for the fallback to work.
8+
9+
10+
from puresasl.client import SASLClient, SASLError
11+
from contextlib import contextmanager
12+
13+
@contextmanager
14+
def error_catcher(self, Exc = Exception):
15+
try:
16+
self.error = None
17+
yield
18+
except Exc as e:
19+
self.error = str(e)
20+
21+
22+
class PureSASLClient(SASLClient):
23+
def __init__(self, *args, **kwargs):
24+
self.error = None
25+
super(PureSASLClient, self).__init__(*args, **kwargs)
26+
27+
def start(self, mechanism):
28+
with error_catcher(self, SASLError):
29+
if isinstance(mechanism, list):
30+
self.choose_mechanism(mechanism)
31+
else:
32+
self.choose_mechanism([mechanism])
33+
return True, self.mechanism, self.process()
34+
# else
35+
return False, mechanism, None
36+
37+
def encode(self, incoming):
38+
with error_catcher(self):
39+
return True, self.unwrap(incoming)
40+
# else
41+
return False, None
42+
43+
def decode(self, outgoing):
44+
with error_catcher(self):
45+
return True, self.wrap(outgoing)
46+
# else
47+
return False, None
48+
49+
def step(self, challenge=None):
50+
with error_catcher(self):
51+
return True, self.process(challenge)
52+
# else
53+
return False, None
54+
55+
def getError(self):
56+
return self.error

0 commit comments

Comments
 (0)