Skip to content

Releases: mar-file-system/GUFI

0.6.5

30 Jul 21:51
Compare
Choose a tag to compare

gufi_query

  • Amortize external database views creation when -Q is not provided
  • Amortize xattr views creation when -x is not passed in

gufi_rollup

  • Clear out old rollup data before copying new data in
    • Unified with gufi_unrollup SQL
  • treesummary is no longer copied upwards when rolling up
  • Index summary.inode to speed up queries
  • Fixed accidental modification of index during dry run

QueuePerThreadPool

  • Claimed work can now be stolen to prevent starvation caused by long running threads
  • If there is work that can be stolen, at least one work item will be taken even if the multiplier results in 0

External Databases:

  • Admins no longer have to know what files to track
  • Changed external databases to be set by users in per-directory files called external.gufi that list one path per line
    • Relative paths with be treated as relative to the source tree (not the current directory in the index)
    • Changed -q to check that external db files listed are valid
  • Now tracked in trace files (old trace files do not have to be changed)

contrib/gufi_sqlite -> src/gufi_sqlite3

  • Added printing results - previous usage did not require it

NEW gufi_index2dir

  • Convert an index into a source tree with file sizes of 0

NEW gufi_trace2dir

  • Convert trace files into a source tree with file sizes of 0

NEW parallel_cpr

  • Parallel cp -r

Misc:

  • When descending a directory, if struct dirent d_type is not set, fall back to calling lstat(2)
  • Updated opendb behavior
  • Updated dupdir behavior
  • No longer replacing both search and prefix with prefix in regression test output

GitHub Actions:

  • Restored code coverage report with codecov
  • Updated actions/checkout to v4
  • Updated actions/cache to v4
  • Added Rocky Linux 9

0.6.4

14 May 19:30
Compare
Choose a tag to compare

New: External User Databases:

  • Allows for arbitrary user data to be attached to filesystem metadata and queried
    • Can be rolled up
  • gufi_dir2index/gufi_dir2trace -q
    • Added e type to trace file format - does not affect old trace files
  • gufi_query -I -Q
    • Added new views: esummary, epentries, exsummary, expentries, evrsummary evrpentries, evrxsummary, evrxpentries
      • Always available, but will not be filled unless -Q is used.
    • Reorganized processdir to be easier to read
  • External user database count is tracked in treesummary
  • Removed attachname column from external_dbs

Extended Attributes:

  • xattrs view and convenience views are now always available, but only filled when -x is passed to gufi_query

gufi_client now calls ssh with subprocess.Popen instead of paramiko

parallel_rmr top-level directory bug fix

Longitudinal Snapshot:

  • More columns
  • Cache intermediate results
  • Allow for rolled up indexes
  • Different views
    • Graph (G)
    • Per Level (L)
    • Siblings (S)
    • Per Directory (D)

Dependencies:

  • Updated sqlite3-pcre to use pcre2
    • Existing installs should delete the GUFI sqlite3-pcre build/install and rebuild
  • Removed paramiko tarball
  • Added new SQLite3 patch to increase attach limit to 254
    • Existing installs should delete the GUFI SQLite3 build/install and rebuild

GitHub Actions:

  • Removed macOS 11 and 13
  • Added macOS 14
  • Added Ubuntu 24.04
  • Now uploading PDFs and RPMs to tagged releases
  • Added test on Windows with cygwin
  • Codecov actions update is causing issues, so changed to not error on upload failure

0.6.3

14 Feb 20:19
Compare
Choose a tag to compare

gufi_query:

  • Input paths can now be symlinks
  • Immediate subdirectories of input paths can now be symlinks
  • gufi_query will get the realpath of the top-level input paths for traversal, but the custom SQLite functions path and rpath will print the current path with the original user provided prefix
    • fpath still prints the actual path

Schema changes:

  • Added ppinode to pentries
  • dmaxgidIdmaxgid
  • inode INT64inode TEXT

New: contrib/longitudinal_snapshot.py

  • Takes snapshot of an index tree and summaries each directory's metadata i.e. min, max, mean, median, histograms, etc. of file size, file count, timestamps, string lengths, etc., and places data into a single SQLite database file that is much smaller than the index itself
    • Recommend running gufi_treesummary_all before generating a snapshot
  • See Discussion in #149

New: contrib/treediff

  • Walks directory tree and prints top-most directory mismatches

More tests

  • Added empty directory to test tree
  • Added deploy test

GitHub Actions

  • macOS 11 → macOS 14
  • Now keeping pdf documentation as artifacts when building main branch for 14 days

0.6.2

20 Dec 20:09
Compare
Choose a tag to compare

Schema Changes:

  • summary now has a 0 size count column called totzero
  • New views summarylong and vrsummarylong join summary and vrsummary with tables/views that contain additional data that should be associated with them but do not need to be added into the summary table. Currently, no extra information is attached.

${SEARCH} now contains an empty db.db to guarantee a db.db above all indexes under ${SEARCH}.

  • This can be expanded in the future to add information that is separate from the rest of the index tree.
  • Fixes #49

NEW: gufi_treesummary_all generates treesummary tables for all directories in an index instead of one directory at a time.

gufi_rollup now also generates treesummary tables while processing index

  • gufi_unrollup does not remove treesummary tables because there is no way to tell whether or not they were generated by gufi_rollup or not. Might add column to say what utility was used to generate them in the future.

gufi_statgufi_stat_bin

  • gufi_stat is now a script that calls gufi_stat_bin
  • Server configuration file now also needs the path to gufi_stat_bin

gufi_stats

  • average-leaf-files
  • average-leaf-links
  • average-leaf-size
  • median-leaf-links
  • median-leaf-size
  • filesize-log2-bins
  • filesize-log1024-bins
  • dirfilecount-log2-bins (#146)
  • dirfilecount-log1024-bins (#146)

Scripts now have a --verbose/-V flag to print the command being run (#142)

bfwreaddirplus2db was reorganized.

NEW: split_trace splits trace files into chunks for parallel processing by gufi_trace2index

SQLite3

  • Updated from version 3.27 to version 3.43 to get built-in math functions

    • Existing indexes should be rebuilt
  • Also added math functions stdevs, stdevp, and median

  • Replaced subdirs_walked() with subdirs(srollsubdirs, sroll)

When printing result columns, the delimiter after the last column is no longer printed

  • Prevents pandas from unnecessarily generating a column of Nones when parsing output

Significant increase in testing and code coverage

CMake

  • db2, fuse, and gpfs tool building can be disabled even if the libraries are found
  • Added make pylint, make shellcheck, and make checkstyle
  • gufi_client_jail should not have been created
  • Example configuration files are no longer renamed to config.example

GitHub Actions:

  • Now building on macOS 11, 12, and 13
  • Now building with -Wall -Wextra -Werror -pedantic

Added cygwin GCC support (not tested with CI)

0.6.1

30 May 15:37
Compare
Choose a tag to compare

Reduced size of struct work
Added optional work compression with zlib to gufi_dir2index, gufi_dir2trace, and gufi_query
Added in-situ processing of work items in descend function - after enqueuing n directories, the remaining immediate directories are processed in the parent thread instead of enqueued
gufi_query no longer requires at least one of -T, -S, or -E
Changed gufi_trace2index to read from file descriptors using pread(2) instead of FILE *s with getline(3)
Removed BENCHMARK macro
Documentation and test updates

QueuePerThreadPool

  • Added soft memory limit to via deferred processing
    • If a thread's wait queue gets too big, new work items are placed in a different queue so they are not processed until the wait queue is empty
    • QPTPool_enqueue now returns whether the new work item was placed in the wait queue or in the deferred queue
  • QPTPool_init now only requires thread count and thread arguments to initialize
    • The other properties can be optionally set with setter functions
    • Previous QPTPool_init has been renamed to QPTPool_init_with_props
  • Symmetrical start up (QPTPool_init and QPTPool_start) and end (QPTPool_wait and QPTPool_destroy)

SQLite3

  • Renamed path() to rpath()
    • Returns full path properly for original and rolled up indicies
    • Use with new views vrsummary, vrpentries, vrxsummary, and vrxpentries
  • Restored path(), epath(), and fpath() functions
  • Removed alignment arguments from functions
  • Updated URI processing to replace percent characters

Renamed

  • bftigufi_treesummary
  • rollupgufi_rollup
  • unrollupgufi_unrollup

Performance History Framework

  • Added helper script that allows user to specify a range of commits and how many times to benchmark each commit
    • Downloads second copy of repo
  • Added support for collecting new/renamed/removed cumulative_times debug values for gufi_query in older commits
  • Plotting supports including or excluding commits without data
  • More documentation

Removed INSTALL, NOTES.txt, Makefiles, and bfmi
Added SQL guide
Added presentation from MSST 2023

0.6.0

19 Dec 23:39
Compare
Choose a tag to compare

Extended Attributes (xattrs) support

  • Secure storage and retrieval of user data
  • Generic permission/user/group "external data" framework
  • Can be rolled up

gufi_query

  • Moved into its own directory
  • Reorganized into smaller files
  • Output targets with and without aggregation are now clearly defined

Enabled GitHub Actions

  • Removed Travis CI
  • Automatic testing on multiple OSs
  • RPM packages
  • Run pylint and shellcheck to clean up scripts
  • Run valgrind to check for memory leaks
  • Code scanning with CodeQL
  • Code coverage with CodeCov

SQLite3

  • Now handling URI paths
  • Updated path() function
  • Removed fpath()
  • Removed epath()

Documentation

  • Added user, administrator, and developer guides (LaTeX)
  • Added PDFs of slides from presentations
  • Added citation to SC22 paper

Updated QueuePerThreadPool API
Added ability to skip directories listed in a file with -k
Added performance history collection framework
All Python code should work with Python 2 and Python 3
Increased testing

XATTR

13 Apr 21:33
Compare
Choose a tag to compare
XATTR Pre-release
Pre-release
0.5.2-rc2

Add CentOS 7 Docker image

XAttr

01 Apr 15:23
Compare
Choose a tag to compare
XAttr Pre-release
Pre-release

Initial implementation of querying with XAttrs

0.5.2-rc0

31 Mar 23:34
Compare
Choose a tag to compare

Added

  • rollup executable to reduce the number of opens that need to be done during a tree walk.
  • unrollup to remove rollup information from a rolled up tree.
  • parallel_rmr to delete trees in parallel.

0.5.1-rc0

07 Oct 18:05
Compare
Choose a tag to compare

Renamed

  • contrib/benchmark.shcontrib/canned_queries.sh
  • scripts/query_builder.pyscripts/gufi_common.py

Internal Interface Changes

  • gufi_query no longer has the options -P and -p
    • Updated scripts and tests to not pass these flags
  • Removed unnecessary arguments from dbutils sqlite3 wrapper functions
  • OutputBuffers now has functions to flush to FILE * and FILE **
  • QueuePerThreadPool
    • No longer has thread pinning as part of the interface
    • Fixed bug with terminating condition
      • QPTPool_start can now be called before any items are placed in the work queue
    • The function passed to QPTPool_start is now the default function to run
      • A function can also be provided to QPTPool_enqueue to change the behavior for that one work item

Updated travis scripts
Added --path-index to gufi_find to allow for selection of a single path from the list of paths provided in the config file.
Added debug prints to gufi_trace2index
Debug prints in gufi_query and gufi_trace2index go through OutputBuffers, so they have very little overhead.