Skip to content

Commit

Permalink
Merge pull request #12 from hariharan-devarajan/feature/thread_id
Browse files Browse the repository at this point in the history
Features to Improve documentation
  • Loading branch information
hariharan-devarajan authored Oct 10, 2023
2 parents 3c2c45a + c1e85c7 commit 2bab541
Show file tree
Hide file tree
Showing 11 changed files with 75 additions and 40 deletions.
51 changes: 22 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,40 +7,33 @@ A low-level profiler for capture I/O calls from deep learning applications.

Requirements
1. Python > 3.8
2. spack

Using spack dlio-profiler
```
spack env create -d ./spack-env
spack env activate -p ./spack-env
spack add [email protected]
spack install
```

create a virtual env for your python package where u will use dlio_profiler.
```
python3 -m venv ./venv
source venv/bin/activate
pip install .
```
install in local user
```
export DLIO_LOGGER_USER=1
pip install .
```
install directly from github
```
pip install git+https://github.com/hariharan-devarajan/dlio-profiler.git
```
Build dlio profiler through cmake
## Build DLIO Profiler with pip

Users can easily install DLIO profiler using pip. This is the way most python packages are installed.
This method would work for both native python environments and conda environments.

### From source

```bash
git clone [email protected]:hariharan-devarajan/dlio-profiler.git
cd dlio-profiler
# You can skip this for installing the dev branch.
# for latest stable version use master branch.
git checkout tags/<Release> -b <Release>
pip install .
```
cd dlio-profiler
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=../venv ../
make install -j

### From Github

```bash
DLP_VERSION=dev
pip install git+https://github.com/hariharan-devarajan/dlio-profiler.git@${DLP_VERSION}
```

For more build instructions check [here](https://dlio-profiler.readthedocs.io/en/latest/build.html)

Usage

```
Expand Down
8 changes: 4 additions & 4 deletions docs/build.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ This section describes how to build DLIO Profiler.

There are three build options:

* build DLIO Profiler with pip (recommended),
* build DLIO Profiler with Spack, and
* build DLIO Profiler with cmake
- build DLIO Profiler with pip (recommended),
- build DLIO Profiler with Spack, and
- build DLIO Profiler with cmake

----------

-----------------------------------------
----------------------------
Build DLIO Profiler with pip
----------------------------

Expand Down
24 changes: 24 additions & 0 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -635,3 +635,27 @@ Job submition script
LD_PRELOAD=./dlio_ml_workloads/PolarisAT/conda-envs/ml_workload_latest_conda/lib/libdlio_profiler_preload.so aprun -n 4 -N 4 python resnet_hvd_dlio.py --batch-size 64 --epochs 1 > dlio_log 2>&1
cat *.pfw > combined_logs.pfw # To combine to a single pfw file.
***********************
Integrated Applications
***********************

Here is the list applications that currently use DLIO Profiler.

1. `DLIO Benchmark <https://github.com/argonne-lcf/dlio_benchmark>`_
2. MuMMI
3. Resnet50 with pytorch and torchvision

****************************
Example Chrome Tracing Plots
****************************

Example of Unet3D application with DLIO Benchmark. This trace shows the first few steps of the benchmark.
Here, we can see that we can get application level calls (e.g., ``train`` and ``TorchDataset``) as well as low-level I/O calls (dark green color).

.. image:: images/tracing/trace.png
:width: 400
:alt: Unet3D applications


Binary file added docs/images/tracing/trace.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions docs/limitations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,9 @@ Limitations

For certain system the spawning of processes create new processes but do not carry env variable.
In those cases the LD_PRELOAD or python module load would not load DLIO Profiler as they are removed by system and result in missing profiling info.

----------------
OS Compatibility
----------------

The profiler internally uses system calls such as ``getpid()`` and ``gettid()`` which are only implemented in Linux OS.
4 changes: 2 additions & 2 deletions src/dlio_profiler/core/dlio_profiler_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -100,15 +100,15 @@ dlio_profiler::DLIOProfilerCore::initlialize(bool is_init, bool _bind, const cha
gotcha_priority = atoi(dlio_profiler_priority_str); // GCOV_EXCL_LINE
}
if (_process_id == nullptr || *_process_id == -1) {
this->process_id = getpid();
this->process_id = dlp_getpid();
} else {
this->process_id = *_process_id;
}
DLIO_PROFILER_LOGINFO("Setting process_id to %d", this->process_id);
if (_log_file == nullptr) {
char *dlio_profiler_log = getenv(DLIO_PROFILER_LOG_FILE);
char proc_name[PATH_MAX], cmd[128];
sprintf(cmd, "/proc/%d/cmdline", getpid());
sprintf(cmd, "/proc/%d/cmdline", dlp_getpid());
int fd = dlp_open(cmd, O_RDONLY);
const char *exec_file_name = nullptr;
if (fd != -1) {
Expand Down
2 changes: 1 addition & 1 deletion src/dlio_profiler/dlio_logger.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ class DLIOLogger {
fd = fileno(stdout);
log_file = "STDOUT";
} else {
int pid = getpid();
int pid = dlp_getpid();
log_file = std::string(dlio_profiler_log_dir) + "/" + "trace_ll_" + std::to_string(pid) + ".pfw";
} // GCOV_EXCL_STOP
}
Expand Down
8 changes: 8 additions & 0 deletions src/dlio_profiler/utils/posix_internal.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,11 @@ int dlp_fsync(int fd) { // GCOV_EXCL_START
ssize_t dlp_readlink(const char *path, char *buf, size_t bufsize) {
return syscall(SYS_readlink, path, buf, bufsize);
}

pid_t dlp_gettid(){
return syscall(SYS_gettid);
}

pid_t dlp_getpid(){
return syscall(SYS_getpid);
}
4 changes: 4 additions & 0 deletions src/dlio_profiler/utils/posix_internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,8 @@ int dlp_fsync(int fd);

ssize_t dlp_readlink(const char *path, char *buf, size_t bufsize);

pid_t dlp_gettid();

pid_t dlp_getpid();

#endif // DLIO_PROFILER_POSIX_INTERNAL_H
6 changes: 3 additions & 3 deletions src/dlio_profiler/writer/chrome_writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,11 @@ dlio_profiler::ChromeWriter::convert_json(std::string &event_name, std::string &
std::stringstream all_stream;
int tid, pid;
if (process_id == -1) {
tid = std::hash<std::thread::id>{}(std::this_thread::get_id()) % 100000;
pid = getpid();
tid = dlp_gettid();
pid = dlp_getpid();
} else {
pid = process_id;
tid = getpid() + std::hash<std::thread::id>{}(std::this_thread::get_id()) % 100000;
tid = dlp_getpid() + dlp_gettid();
}
auto start_sec = std::chrono::duration<TimeResolution, std::ratio<1>>(start_time);
auto duration_sec = std::chrono::duration<TimeResolution, std::ratio<1>>(duration);
Expand Down
2 changes: 1 addition & 1 deletion src/dlio_profiler/writer/chrome_writer.h
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ namespace dlio_profiler {
if (enable_core_affinity_str != nullptr && strcmp(enable_core_affinity_str, "1") == 0) {
enable_core_affinity = true;
}
process_id = getpid();
process_id = dlp_getpid();
this->fd = fd;
if (enable_core_affinity) {
hwloc_topology_init(&topology); // initialization
Expand Down

0 comments on commit 2bab541

Please sign in to comment.