-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathCITATION.cff
79 lines (76 loc) · 2.73 KB
/
CITATION.cff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: High-Performance Data Analytics in Python
message: 'If you found this lesson useful, please cite it as below.'
type: software
authors:
- family-names: Fiusco
given-names: Francesco
orcid: 'https://orcid.org/0000-0003-0716-465X'
email: [email protected]
- family-names: Li
given-names: Qiang
orcid: 'https://orcid.org/0000-0002-6390-0343'
email: [email protected]
- family-names: Mohanan
given-names: Ashwin Vishnu
orcid: 'https://orcid.org/0000-0002-2979-6327'
email: [email protected]
- family-names: Wang
given-names: Yonglei
orcid: 'https://orcid.org/0000-0003-3393-7257'
email: [email protected]
- family-names: Wikfeldt
given-names: Kjartan Thor
orcid: 'https://orcid.org/0000-0002-1655-3676'
email: [email protected]
doi: 10.5281/zenodo.14844443
identifiers:
- type: doi
value: 10.5281/zenodo.14844443
description: Citeable archive in Zenodo
# - type: url
# value: 'https://enccs.github.io/hpda-python/'
# description: Rendered course material in HTML
repository-code: 'https://github.com/ENCCS/hpda-python'
url: 'https://enccs.github.io/hpda-python/'
abstract: >-
ENCCS lesson material on High-Performance Data Analytics
in Python. Scientists, engineers and professionals from
many sectors are seeing an enormous growth in the size and
number of datasets relevant to their domains. Professional
titles have emerged to describe specialists working with
data, such as data scientists and data engineers, but also
other experts are finding it necessary to learn tools and
techniques to work with big data. Typical tasks include
preprocessing, analysing, modeling and visualising data.
Python is an industry-standard programming language for
working with data on all levels of the data analytics
pipeline. This is in large part because of the rich
ecosystem of libraries ranging from generic numerical
libraries to special-purpose and/or domain-specific
packages, often supported by large developer communities
and stable funding sources.
This lesson is meant to give an overview of working with
research data in Python using general libraries for
storing, processing, analysing and sharing data. The focus
is on high performance. After covering tools for
performant processing on single workstations the focus
shifts to profiling and optimising, parallel and
distributed computing.
keywords:
- python
- HPDA
- big data
- array computing
- parallel computing
- benchmarking
- profiling
- optimizing
- numpy
- pandas
- dask
license: CC-BY-4.0
version: 2025.02.10
date-released: '2025-02-10'