Skip to content

Commit

Permalink
Simplify the cache module and interface (#20)
Browse files Browse the repository at this point in the history
- Supports tags
- Fixed issues with sqlalchemy and session handling
- Supports various compression methods in the hashlib library
- Cleanup expired resources in the cache
- Updated documentation & tests
  • Loading branch information
jkanche authored Dec 3, 2024
1 parent 24b7702 commit 74f8ade
Show file tree
Hide file tree
Showing 15 changed files with 963 additions and 489 deletions.
22 changes: 21 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
# Changelog

## Version 0.4 (development)
## Version 0.5.0

- SQLAlchemy session management
* Implemented proper session handling
* Fixed `DetachedInstanceError` issues and added helper method `_get_detached_resource` for consistent session management
* Improved transaction handling with commits and rollbacks

- New features
* Added cache statistics with `get_stats()` method
* Implemented resource tagging
* Added cache size management
* Added support for file compression
* Added resource validation with checksums
* Improved search
* Added metadata export/import functionality

## Version 0.4.1

- Method to list all resources.

## Version 0.4

- Migrate the schema to match R/Bioconductor's BiocFileCache (Check out [this issue](https://github.com/BiocPy/pyBiocFileCache/issues/11)). Thanks to [@khoroshevskyi ](https://github.com/khoroshevskyi) for the PR.

Expand Down
107 changes: 62 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,74 +4,91 @@

# pyBiocFileCache

File system based cache for resources & metadata. Compatible with [BiocFileCache R package](https://github.com/Bioconductor/BiocFileCache)
`pyBiocFileCache` is a Python package that provides a robust file caching system with resource validation, cache size management, file compression, and resource tagging. Compatible with [BiocFileCache R package](https://github.com/Bioconductor/BiocFileCache).

***Note: Package is in development. Use with caution!!***
## Installation

### Installation
Install from [PyPI](https://pypi.org/project/pyBiocFileCache/),

Package is published to [PyPI](https://pypi.org/project/pyBiocFileCache/)

```
```bash
pip install pybiocfilecache
```

#### Initialize a cache directory
## Quick Start

```
from pybiocfilecache import BiocFileCache
import os
bfc = BiocFileCache(cache_dir = os.getcwd() + "/cache")
```
```python
from biocfilecache import BiocFileCache

Once the cache directory is created, the library provides methods to
- `add`: Add a resource or artifact to cache
- `get`: Get the resource from cache
- `remove`: Remove a resource from cache
- `update`: update the resource in cache
- `purge`: purge the entire cache, removes all files in the cache directory
# Initialize cache
cache = BiocFileCache("path/to/cache/directory")

### Add a resource to cache
# Add a file to cache
resource = cache.add("myfile", "path/to/file.txt")

(for testing use the temp files in the `tests/data` directory)
# Retrieve a file from cache
resource = cache.get("myfile")

```
rec = bfc.add("test1", os.getcwd() + "/test1.txt")
print(rec)
# Use the cached file
print(resource.rpath) # Path to cached file
```

### Get resource from cache
## Advanced Usage

```
rec = bfc.get("test1")
print(rec)
```
### Configuration

### Remove resource from cache
```python
from biocfilecache import BiocFileCache, CacheConfig
from datetime import timedelta
from pathlib import Path

```
rec = bfc.remove("test1")
print(rec)
# Create custom configuration
config = CacheConfig(
cache_dir=Path("cache_directory"),
max_size_bytes=1024 * 1024 * 1024, # 1GB
cleanup_interval=timedelta(days=7),
compression=True
)

# Initialize cache with configuration
cache = BiocFileCache(config=config)
```

### Update resource in cache
### Resource Management

```
rec = bfc.get("test1"m os.getcwd() + "test2.txt")
print(rec)
```
```python
# Add file with tags and expiration
from datetime import datetime, timedelta

### purge the cache
resource = cache.add(
"myfile",
"path/to/file.txt",
tags=["data", "raw"],
expires=datetime.now() + timedelta(days=30)
)

```
bfc.purge()
# List resources by tag
resources = cache.list_resources(tag="data")

# Search resources
results = cache.search("myfile", field="rname")

# Update resource
cache.update("myfile", "path/to/new_file.txt")

# Remove resource
cache.remove("myfile")
```

### Cache Statistics and Maintenance

<!-- pyscaffold-notes -->
```python
# Get cache statistics
stats = cache.get_stats()
print(stats)

## Note
# Clean up expired resources
removed_count = cache.cleanup()

This project has been set up using PyScaffold 4.1. For details and usage
information on PyScaffold see https://pyscaffold.org/.
# Purge entire cache
cache.purge()
```
28 changes: 28 additions & 0 deletions docs/best_practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Best Practices

1. Use context managers for cleanup:
```python
with BiocFileCache("cache_directory") as cache:
cache.add("myfile", "path/to/file.txt")
```

2. Add tags for better organization:
```python
cache.add("data.csv", "data.csv", tags=["raw", "csv", "2024"])
```

3. Set expiration dates for temporary files:
```python
cache.add("temp.txt", "temp.txt", expires=datetime.now() + timedelta(hours=1))
```

4. Regular maintenance:
```python
# Periodically clean up expired resources
cache.cleanup()

# Monitor cache size
stats = cache.get_stats()
if stats["cache_size_bytes"] > threshold:
# Take action
```
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ package_dir =
# For more information, check out https://semver.org/.
install_requires =
importlib-metadata; python_version<"3.8"
sqlalchemy>=2,<2.1
sqlalchemy

[options.packages.find]
where = src
Expand Down
Loading

0 comments on commit 74f8ade

Please sign in to comment.