Skip to content

Commit 32ee963

Browse files
betodealmeidadpgasparbkyryliukserenajiangUsiel
authored
feat: add JWT support to PyHive (#1)
* feat: add HTTP and HTTPS to hive (dropbox#385) * feat: add https protocol * support HTTP * fix: make hive https py2 compat (dropbox#389) * fix: make hive https py2 compat * fix lint * Update README.rst (dropbox#423) * chore: rename Trino entry point (dropbox#428) * Support for Presto decimals (dropbox#430) * Support for Presto decimals * lower * Use str type for driver and name in HiveDialect (dropbox#450) PyHive's HiveDialect usage of bytes for the name and driver fields is not the norm is causing issues upstream: apache/superset#22316 Even other dialects within PyHive use strings. SQLAlchemy does not strictly require a string, but all the stock dialects return a string, so I figure it is heavily implied. I think the risk of breaking something upstream with this change is low (but it is there ofc). I figure in most cases we just make someone's `str(dialect.driver)` expression redundant. Examples for some of the other stock sqlalchemy dialects (name and driver fields using str): https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/pysqlite.py#L501 https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/base.py#L1891 https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/base.py#L2383 https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/mysqldb.py#L113 https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/pymysql.py#L59 * Correcting Iterable import for python 3.10 (dropbox#451) * changing drivers to support hive, presto and trino with sqlalchemy>=2.0 (dropbox#448) * Revert "changing drivers to support hive, presto and trino with sqlalchemy>=2.0 (dropbox#448)" (dropbox#452) This reverts commit b0206d3. * Update __init__.py (dropbox#453) dropbox@1c1da8b dropbox@1f99552 * use pure-sasl with python 3.11 (dropbox#454) * minimal changes for sqlalchemy 2.0 support (dropbox#457) * update readme to reflect recent changes (dropbox#459) * Update README.rst (dropbox#475) * Update README.rst (dropbox#476) * feat: JWT support * Add CI to build package --------- Co-authored-by: Daniel Vaz Gaspar <[email protected]> Co-authored-by: Bogdan <[email protected]> Co-authored-by: serenajiang <[email protected]> Co-authored-by: Usiel Riedl <[email protected]> Co-authored-by: Multazim Deshmukh <[email protected]> Co-authored-by: nicholas-miles <[email protected]>
1 parent d6e7140 commit 32ee963

23 files changed

+1366
-321
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,4 @@ cover/
1414
.cache/
1515
*.iml
1616
/scripts/.thrift_gen
17+
.python-version

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
0.7.0a
2+
======
3+
4+
- Add support for JWT authentication.

Jenkinsfile

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
LIB_NAME = 'PyHive'
2+
String currentVersion = ""
3+
4+
5+
podTemplate(
6+
imagePullSecrets: ['preset-pull'],
7+
nodeUsageMode: 'NORMAL',
8+
containers: [
9+
containerTemplate(
10+
alwaysPullImage: true,
11+
name: 'ci',
12+
image: 'preset/ci:latest',
13+
ttyEnabled: true,
14+
command: 'cat',
15+
resourceRequestCpu: '100m',
16+
resourceLimitCpu: '200m',
17+
resourceRequestMemory: '1000Mi',
18+
resourceLimitMemory: '2000Mi',
19+
),
20+
containerTemplate(
21+
alwaysPullImage: true,
22+
name: 'py-ci',
23+
image: 'preset/python:3.8.9-ci',
24+
ttyEnabled: true,
25+
command: 'cat'
26+
)
27+
]
28+
) {
29+
node(POD_LABEL) {
30+
container('py-ci') {
31+
stage('Checkout') {
32+
checkout scm
33+
}
34+
35+
stage('Tests') {
36+
sh(script: 'pip install -e . && pip install -r requirements-dev.txt', label: 'install dependencies')
37+
parallel(
38+
check: {
39+
currentVersion = sh(
40+
script: "python setup.py --version",
41+
returnStdout: true,
42+
label: 'Get current version'
43+
).trim()
44+
def retVal = sh(
45+
script: "curl -I -f https://pypi.devops.preset.zone/${LIB_NAME}/${LIB_NAME}-${currentVersion}.tar.gz",
46+
returnStatus: true,
47+
label: 'Check for existing tarball'
48+
)
49+
// If the thing exists, we should bail as we don't want to overwrite
50+
if (retVal == 0) {
51+
error("Version ${currentVersion} of ${LIB_NAME} already exists! Version bump required.")
52+
}
53+
}
54+
)
55+
}
56+
}
57+
58+
container('py-ci') {
59+
stage('Package Release') {
60+
if (env.BRANCH_NAME.startsWith("PR-")) {
61+
def shortGitRev = sh(
62+
returnStdout: true,
63+
script: 'git rev-parse --short HEAD'
64+
).trim()
65+
def pullRequestVersion = "${currentVersion}+${env.BRANCH_NAME}.${shortGitRev}"
66+
sh(script:"sed -i \'s/version = ${currentVersion}/version = ${pullRequestVersion}/g\' setup.cfg", label: 'Changing version for PR')
67+
sh(script:"echo PR version: ${pullRequestVersion}", label: 'PR Release candidate version')
68+
}
69+
sh(script: 'python setup.py sdist --formats=gztar', label: 'Bundling release')
70+
sh(script: "mkdir -p dist/${LIB_NAME} && mv dist/*.gz dist/${LIB_NAME}", label: 'Setup release folder')
71+
}
72+
}
73+
74+
container('ci') {
75+
stage('Upload Release') {
76+
withCredentials([
77+
[
78+
$class : 'AmazonWebServicesCredentialsBinding',
79+
credentialsId : 'ci-user',
80+
accessKeyVariable: 'AWS_ACCESS_KEY_ID',
81+
secretKeyVariable: 'AWS_SECRET_ACCESS_KEY',
82+
]
83+
]) {
84+
if ((env.BRANCH_NAME == 'master') || (env.BRANCH_NAME.startsWith("PR-"))) {
85+
sh(script: "aws s3 sync ./dist s3://preset-pypi", label: "Uploading to s3")
86+
}
87+
else {
88+
echo "Skipping upload as this isn't master..."
89+
}
90+
}
91+
}
92+
}
93+
}
94+
}

README.rst

Lines changed: 72 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,31 @@
1-
.. image:: https://travis-ci.org/dropbox/PyHive.svg?branch=master
2-
:target: https://travis-ci.org/dropbox/PyHive
3-
.. image:: https://img.shields.io/codecov/c/github/dropbox/PyHive.svg
1+
========================================================
2+
PyHive project has been donated to Apache Kyuubi
3+
========================================================
4+
5+
You can follow it's development and report any issues you are experiencing here: https://github.com/apache/kyuubi/tree/master/python/pyhive
6+
7+
8+
9+
Legacy notes / instructions
10+
===========================
411

5-
======
612
PyHive
7-
======
13+
**********
14+
815

916
PyHive is a collection of Python `DB-API <http://www.python.org/dev/peps/pep-0249/>`_ and
10-
`SQLAlchemy <http://www.sqlalchemy.org/>`_ interfaces for `Presto <http://prestodb.io/>`_ and
11-
`Hive <http://hive.apache.org/>`_.
17+
`SQLAlchemy <http://www.sqlalchemy.org/>`_ interfaces for `Presto <http://prestodb.io/>`_ ,
18+
`Hive <http://hive.apache.org/>`_ and `Trino <https://trino.io/>`_.
1219

1320
Usage
14-
=====
21+
**********
1522

1623
DB-API
1724
------
1825
.. code-block:: python
1926
2027
from pyhive import presto # or import hive or import trino
21-
cursor = presto.connect('localhost').cursor()
28+
cursor = presto.connect('localhost').cursor() # or use hive.connect or use trino.connect
2229
cursor.execute('SELECT * FROM my_awesome_data LIMIT 10')
2330
print cursor.fetchone()
2431
print cursor.fetchall()
@@ -54,7 +61,7 @@ In Python 3.7 `async` became a keyword; you can use `async_` instead:
5461
5562
SQLAlchemy
5663
----------
57-
First install this package to register it with SQLAlchemy (see ``setup.py``).
64+
First install this package to register it with SQLAlchemy, see ``entry_points`` in ``setup.py``.
5865

5966
.. code-block:: python
6067
@@ -64,12 +71,33 @@ First install this package to register it with SQLAlchemy (see ``setup.py``).
6471
# Presto
6572
engine = create_engine('presto://localhost:8080/hive/default')
6673
# Trino
67-
engine = create_engine('trino://localhost:8080/hive/default')
74+
engine = create_engine('trino+pyhive://localhost:8080/hive/default')
6875
# Hive
6976
engine = create_engine('hive://localhost:10000/default')
77+
78+
# SQLAlchemy < 2.0
7079
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
7180
print select([func.count('*')], from_obj=logs).scalar()
7281
82+
# Hive + HTTPS + LDAP or basic Auth
83+
engine = create_engine('hive+https://username:password@localhost:10000/')
84+
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
85+
print select([func.count('*')], from_obj=logs).scalar()
86+
87+
# SQLAlchemy >= 2.0
88+
metadata_obj = MetaData()
89+
books = Table("books", metadata_obj, Column("id", Integer), Column("title", String), Column("primary_author", String))
90+
metadata_obj.create_all(engine)
91+
inspector = inspect(engine)
92+
inspector.get_columns('books')
93+
94+
with engine.connect() as con:
95+
data = [{ "id": 1, "title": "The Hobbit", "primary_author": "Tolkien" },
96+
{ "id": 2, "title": "The Silmarillion", "primary_author": "Tolkien" }]
97+
con.execute(books.insert(), data[0])
98+
result = con.execute(text("select * from books"))
99+
print(result.fetchall())
100+
73101
Note: query generation functionality is not exhaustive or fully tested, but there should be no
74102
problem with raw SQL.
75103

@@ -89,7 +117,7 @@ Passing session configuration
89117
'session_props': {'query_max_run_time': '1234m'}}
90118
)
91119
create_engine(
92-
'trino://user@host:443/hive',
120+
'trino+pyhive://user@host:443/hive',
93121
connect_args={'protocol': 'https',
94122
'session_props': {'query_max_run_time': '1234m'}}
95123
)
@@ -104,27 +132,30 @@ Passing session configuration
104132
)
105133
106134
Requirements
107-
============
135+
************
108136

109137
Install using
110138

111-
- ``pip install 'pyhive[hive]'`` for the Hive interface and
112-
- ``pip install 'pyhive[presto]'`` for the Presto interface.
139+
- ``pip install 'pyhive[hive]'`` or ``pip install 'pyhive[hive_pure_sasl]'`` for the Hive interface
140+
- ``pip install 'pyhive[presto]'`` for the Presto interface
113141
- ``pip install 'pyhive[trino]'`` for the Trino interface
114142

143+
Note: ``'pyhive[hive]'`` extras uses `sasl <https://pypi.org/project/sasl/>`_ that doesn't support Python 3.11, See `github issue <https://github.com/cloudera/python-sasl/issues/30>`_.
144+
Hence PyHive also supports `pure-sasl <https://pypi.org/project/pure-sasl/>`_ via additional extras ``'pyhive[hive_pure_sasl]'`` which support Python 3.11.
145+
115146
PyHive works with
116147

117148
- Python 2.7 / Python 3
118-
- For Presto: Presto install
119-
- For Trino: Trino install
149+
- For Presto: `Presto installation <https://prestodb.io/docs/current/installation.html>`_
150+
- For Trino: `Trino installation <https://trino.io/docs/current/installation.html>`_
120151
- For Hive: `HiveServer2 <https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2>`_ daemon
121152

122153
Changelog
123-
=========
154+
*********
124155
See https://github.com/dropbox/PyHive/releases.
125156

126157
Contributing
127-
============
158+
************
128159
- Please fill out the Dropbox Contributor License Agreement at https://opensource.dropbox.com/cla/ and note this in your pull request.
129160
- Changes must come with tests, with the exception of trivial things like fixing comments. See .travis.yml for the test environment setup.
130161
- Notes on project scope:
@@ -134,8 +165,28 @@ Contributing
134165
- We prefer having a small number of generic features over a large number of specialized, inflexible features.
135166
For example, the Presto code takes an arbitrary ``requests_session`` argument for customizing HTTP calls, as opposed to having a separate parameter/branch for each ``requests`` option.
136167

168+
Tips for test environment setup
169+
****************************************
170+
You can setup test environment by following ``.travis.yaml`` in this repository. It uses `Cloudera's CDH 5 <https://docs.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_download_510.html>`_ which requires username and password for download.
171+
It may not be feasible for everyone to get those credentials. Hence below are alternative instructions to setup test environment.
172+
173+
You can clone `this repository <https://github.com/big-data-europe/docker-hive/blob/master/docker-compose.yml>`_ which has Docker Compose setup for Presto and Hive.
174+
You can add below lines to its docker-compose.yaml to start Trino in same environment::
175+
176+
trino:
177+
image: trinodb/trino:351
178+
ports:
179+
- "18080:18080"
180+
volumes:
181+
- ./trino:/etc/trino
182+
183+
Note: ``./trino`` for docker volume defined above is `trino config from PyHive repository <https://github.com/dropbox/PyHive/tree/master/scripts/travis-conf/trino>`_
184+
185+
Then run::
186+
docker-compose up -d
187+
137188
Testing
138-
=======
189+
*******
139190
.. image:: https://travis-ci.org/dropbox/PyHive.svg
140191
:target: https://travis-ci.org/dropbox/PyHive
141192
.. image:: http://codecov.io/github/dropbox/PyHive/coverage.svg?branch=master
@@ -154,7 +205,7 @@ WARNING: This drops/creates tables named ``one_row``, ``one_row_complex``, and `
154205
database called ``pyhive_test_database``.
155206

156207
Updating TCLIService
157-
====================
208+
********************
158209

159210
The TCLIService module is autogenerated using a ``TCLIService.thrift`` file. To update it, the
160211
``generate.py`` file can be used: ``python generate.py <TCLIServiceURL>``. When left blank, the

dev_requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ pytest-timeout==1.2.0
1212
requests>=1.0.0
1313
requests_kerberos>=0.12.0
1414
sasl>=0.2.1
15+
pure-sasl>=0.6.2
16+
kerberos>=1.3.0
1517
thrift>=0.10.0
1618
#thrift_sasl>=0.1.0
1719
git+https://github.com/cloudera/thrift_sasl # Using master branch in order to get Python 3 SASL patches

pyhive/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
from __future__ import absolute_import
22
from __future__ import unicode_literals
3-
__version__ = '0.6.3'
3+
__version__ = '0.7.0a'

pyhive/common.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,11 @@
1818
from future.utils import with_metaclass
1919
from itertools import islice
2020

21+
try:
22+
from collections.abc import Iterable
23+
except ImportError:
24+
from collections import Iterable
25+
2126

2227
class DBAPICursor(with_metaclass(abc.ABCMeta, object)):
2328
"""Base class for some common DB-API logic"""
@@ -245,7 +250,7 @@ def escape_item(self, item):
245250
return self.escape_number(item)
246251
elif isinstance(item, basestring):
247252
return self.escape_string(item)
248-
elif isinstance(item, collections.Iterable):
253+
elif isinstance(item, Iterable):
249254
return self.escape_sequence(item)
250255
elif isinstance(item, datetime.datetime):
251256
return self.escape_datetime(item, self._DATETIME_FORMAT)

0 commit comments

Comments
 (0)