Skip to content

Commit

Permalink
fix cache_file_name docstring to make it explicit that it is a path
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Jan 13, 2021
1 parent e36d466 commit 42ccc00
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 14 deletions.
20 changes: 10 additions & 10 deletions src/datasets/arrow_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1163,7 +1163,7 @@ def map(
keep_in_memory (`bool`, defaults to `False`): Keep the dataset in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the current computation from `function`
can be identified, use it instead of recomputing.
cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a cache file to use to store the
cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
results of the computation instead of the automatically generated cache file name.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Higher value gives smaller cache files, lower value consume less temporary memory while running `.map()`.
Expand Down Expand Up @@ -1369,7 +1369,7 @@ def _map_single(
keep_in_memory (`bool`, defaults to `False`): Keep the dataset in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the current computation from `function`
can be identified, use it instead of recomputing.
cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a cache file to use to store the
cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
results of the computation instead of the automatically generated cache file name.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Higher value gives smaller cache files, lower value consume less temporary memory while running `.map()`.
Expand Down Expand Up @@ -1590,7 +1590,7 @@ def filter(
keep_in_memory (`bool`, defaults to `False`): Keep the dataset in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the current computation from `function`
can be identified, use it instead of recomputing.
cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a cache file to use to store the
cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
results of the computation instead of the automatically generated cache file name.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Higher value gives smaller cache files, lower value consume less temporary memory while running `.map()`.
Expand Down Expand Up @@ -1660,7 +1660,7 @@ def flatten_indices(
Args:
keep_in_memory (`bool`, default: `False`): Keep the dataset in memory instead of writing it to a cache file.
cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a cache file to use to store the
cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
results of the computation instead of the automatically generated cache file name.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Higher value gives smaller cache files, lower value consume less temporary memory while running `.map()`.
Expand Down Expand Up @@ -1736,7 +1736,7 @@ def select(
Args:
`indices` (sequence, iterable, ndarray or Series): List or 1D-array of integer indices for indexing.
`keep_in_memory` (`bool`, default: `False`): Keep the indices mapping in memory instead of writing it to a cache file.
`indices_cache_file_name` (`Optional[str]`, default: `None`): Provide the name of a cache file to use to store the
`indices_cache_file_name` (`Optional[str]`, default: `None`): Provide the name of a path for the cache file. It is used to store the
indices mapping instead of the automatically generated cache file name.
`writer_batch_size` (`int`, default: `1000`): Number of rows per write operation for the cache file writer.
Higher value gives smaller cache files, lower value consume less temporary memory while running `.map()`.
Expand Down Expand Up @@ -1830,7 +1830,7 @@ def sort(
keep_in_memory (`bool`, defaults to `False`): Keep the sorted indices in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the sorted indices
can be identified, use it instead of recomputing.
indices_cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a cache file to use to store the
indices_cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
sorted indices instead of the automatically generated cache file name.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Higher value gives smaller cache files, lower value consume less temporary memory.
Expand Down Expand Up @@ -1906,7 +1906,7 @@ def shuffle(
keep_in_memory (`bool`, defaults to `False`): Keep the shuffled indices in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the shuffled indices
can be identified, use it instead of recomputing.
indices_cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a cache file to use to store the
indices_cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
shuffled indices instead of the automatically generated cache file name.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Higher value gives smaller cache files, lower value consume less temporary memory while running `.map()`.
Expand Down Expand Up @@ -1998,9 +1998,9 @@ def train_test_split(
keep_in_memory (`bool`, defaults to `False`): Keep the splits indices in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the splits indices
can be identified, use it instead of recomputing.
train_cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a cache file to use to store the
train_cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
train split indices instead of the automatically generated cache file name.
test_cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a cache file to use to store the
test_cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
test split indices instead of the automatically generated cache file name.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Higher value gives smaller cache files, lower value consume less temporary memory while running `.map()`.
Expand Down Expand Up @@ -2183,7 +2183,7 @@ def shard(
keep_in_memory (`bool`, defaults to `False`): Keep the dataset in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the current computation from `function`
can be identified, use it instead of recomputing.
indices_cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a cache file to use to store the
indices_cache_file_name (`Optional[str]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
indices of each shard instead of the automatically generated cache file name.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Higher value gives smaller cache files, lower value consume less temporary memory while running `.map()`.
Expand Down
8 changes: 4 additions & 4 deletions src/datasets/dataset_dict.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ def map(
keep_in_memory (`bool`, defaults to `False`): Keep the dataset in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the current computation from `function`
can be identified, use it instead of recomputing.
cache_file_names (`Optional[Dict[str, str]]`, defaults to `None`): Provide the name of a cache file to use to store the
cache_file_names (`Optional[Dict[str, str]]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
results of the computation instead of the automatically generated cache file name.
You have to provide one :obj:`cache_file_name` per dataset in the dataset dictionary.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Expand Down Expand Up @@ -337,7 +337,7 @@ def filter(
keep_in_memory (`bool`, defaults to `False`): Keep the dataset in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the current computation from `function`
can be identified, use it instead of recomputing.
cache_file_names (`Optional[Dict[str, str]]`, defaults to `None`): Provide the name of a cache file to use to store the
cache_file_names (`Optional[Dict[str, str]]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
results of the computation instead of the automatically generated cache file name.
You have to provide one :obj:`cache_file_name` per dataset in the dataset dictionary.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Expand Down Expand Up @@ -394,7 +394,7 @@ def sort(
keep_in_memory (`bool`, defaults to `False`): Keep the dataset in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the current computation from `function`
can be identified, use it instead of recomputing.
indices_cache_file_names (`Optional[Dict[str, str]]`, defaults to `None`): Provide the name of a cache file to use to store the
indices_cache_file_names (`Optional[Dict[str, str]]`, defaults to `None`): Provide the name of a path for the cache file. It is used to store the
indices mapping instead of the automatically generated cache file name.
You have to provide one :obj:`cache_file_name` per dataset in the dataset dictionary.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Expand Down Expand Up @@ -446,7 +446,7 @@ def shuffle(
keep_in_memory (`bool`, defaults to `False`): Keep the dataset in memory instead of writing it to a cache file.
load_from_cache_file (`bool`, defaults to `True`): If a cache file storing the current computation from `function`
can be identified, use it instead of recomputing.
indices_cache_file_names (`Optional[Dict[str, str]]`, default: `None`): Provide the name of a cache file to use to store the
indices_cache_file_names (`Optional[Dict[str, str]]`, default: `None`): Provide the name of a path for the cache file. It is used to store the
indices mappings instead of the automatically generated cache file name.
You have to provide one :obj:`cache_file_name` per dataset in the dataset dictionary.
writer_batch_size (`int`, defaults to `1000`): Number of rows per write operation for the cache file writer.
Expand Down

0 comments on commit 42ccc00

Please sign in to comment.