Skip to content

Develop new Koza API + general refactoring#163

Merged
kevinschaper merged 55 commits intomainfrom
koza-api-new
Jul 10, 2025
Merged

Develop new Koza API + general refactoring#163
kevinschaper merged 55 commits intomainfrom
koza-api-new

Conversation

@ptgolden
Copy link
Copy Markdown
Member

@ptgolden ptgolden commented Jan 14, 2025

This is a huge commit, because it was hard to change one part of Koza without changing everything.

Here are the headlines:

Change the API for writing transforms

The best way to see this is in the diffs for the examples. Take the example string-w-map transform. (Here's the split diff for convenience)

Old API:

import uuid

from biolink_model.datamodel.pydanticmodel_v2 import Gene, PairwiseGeneToGeneInteraction

from koza.cli_utils import get_koza_app

source_name = "map-protein-links-detailed"
map_name = "entrez-2-string"

koza_app = get_koza_app(source_name)
row = koza_app.get_row()
koza_map = koza_app.get_map(map_name)

from loguru import logger

logger.info(koza_map)

gene_a = Gene(id="NCBIGene:" + koza_map[row["protein1"]]["entrez"])
gene_b = Gene(id="NCBIGene:" + koza_map[row["protein2"]]["entrez"])

pairwise_gene_to_gene_interaction = PairwiseGeneToGeneInteraction(
    id="uuid:" + str(uuid.uuid1()),
    subject=gene_a.id,
    object=gene_b.id,
    predicate="biolink:interacts_with",
    knowledge_level="not_provided",
    agent_type="not_provided",
)

koza_app.write(gene_a, gene_b, pairwise_gene_to_gene_interaction)

New API:

import uuid

from biolink_model.datamodel.pydanticmodel_v2 import Gene, PairwiseGeneToGeneInteraction

from koza.runner import KozaTransform


def transform_record(koza: KozaTransform, record: dict):
    a = record["protein1"]
    b = record["protein2"]
    mapped_a = koza.lookup(a, "entrez")
    mapped_b = koza.lookup(b, "entrez")
    gene_a = Gene(id="NCBIGene:" + mapped_a)
    gene_b = Gene(id="NCBIGene:" + mapped_b)

    pairwise_gene_to_gene_interaction = PairwiseGeneToGeneInteraction(
        id="uuid:" + str(uuid.uuid1()),
        subject=gene_a.id,
        object=gene_b.id,
        predicate="biolink:interacts_with",
        knowledge_level="not_provided",
        agent_type="not_provided",
    )

    koza.write(gene_a, gene_b, pairwise_gene_to_gene_interaction)

Note a few things here:

  1. No more song and dance preamble in the transform (import cli_utils, koza_app = get_koza_app(), koza_app.get_map(), koza_app.get_row() and so on). Instead, you just need to write one function, called transform_record, which injects two arguments: koza (which provides koza.write for writing, koza.lookup for mapping, and a few other things), and record (the dict representing a row in a CSV file, a line in JSON lines, or an object in a JSON array).
  2. Following from (1), we do not assume that this file will be reloaded on every record. Instead, one function is run over and over for each record. This should be a more intuitive way to think about how Koza transforms run. (It may pave the way for transform that run in parallel, too).
  3. Map lookups are done with a function (koza.lookup(term, map_name)), rather than indexing into a nested dict (koza_map[term]]["entrez"]).
  4. Abstractly, more functionality could be added to the koza argument-- it's a central place for functionality provided to a transform.

This was an example of what we used to call a "loop" transform. Here's an example of a "flat" transform. We can take examples/string/protein-links-detailed.py as an example.

Old:

import re
import uuid

from biolink_model.datamodel.pydanticmodel_v2 import PairwiseGeneToGeneInteraction, Protein

from koza.cli_utils import get_koza_app

koza_app = get_koza_app('protein-links-detailed')

while (row := koza_app.get_row()) is not None:
    protein_a = Protein(id='ENSEMBL:' + re.sub(r'\d+\.', '', row['protein1']))
    protein_b = Protein(id='ENSEMBL:' + re.sub(r'\d+\.', '', row['protein2']))

    pairwise_gene_to_gene_interaction = PairwiseGeneToGeneInteraction(
        id="uuid:" + str(uuid.uuid1()),
        subject=protein_a.id,
        object=protein_b.id,
        predicate="biolink:interacts_with",
        knowledge_level="not_provided",
        agent_type="not_provided",
    )

    koza_app.write(protein_a, protein_b, pairwise_gene_to_gene_interaction)

New:

import re
import uuid

from biolink_model.datamodel.pydanticmodel_v2 import PairwiseGeneToGeneInteraction, Protein

from koza.runner import KozaTransform


def transform(koza: KozaTransform):
    for row in koza.data:
        protein_a = Protein(id="ENSEMBL:" + re.sub(r"\d+\.", "", row["protein1"]))
        protein_b = Protein(id="ENSEMBL:" + re.sub(r"\d+\.", "", row["protein2"]))

        pairwise_gene_to_gene_interaction = PairwiseGeneToGeneInteraction(
            id="uuid:" + str(uuid.uuid1()),
            subject=protein_a.id,
            object=protein_b.id,
            predicate="biolink:interacts_with",
            knowledge_level="not_provided",
            agent_type="not_provided",
        )

        koza.write(protein_a, protein_b, pairwise_gene_to_gene_interaction)

About the same as the previous example, but this time all of the functionality for the transform is in a function named transform.

In the old API, "loop" and "flat" transforms were differentiated in the YAML config. Now, it's determined by the name of the function you define in your transform module. In both cases, transforms will only be loaded once, whether in what we used to call "loop" or "flat" mode. But:

  • If there is a function named transform, it will be run once, and it's up to the the consumer to read all the data. (This is equivalent to "flat" mode).
  • If there is a function named transform_record, it will be run for every record from every reader. (Equivalent to "loop" mode)

If neither or both of these functions are defined, an error is raised and Koza bails out.

Major overhaul of the runner

To support this new API, there is a (basically) completely new runner class.

At a high level, here's how Koza used to work:

In "loop" mode:

  1. Load the python module defined in the transform.
  2. Assume that this code has top-level code that calls .get_row() and that loading it will have side effects. Let all code run.
  3. Once the top-level code in module has run, reload it and goto (2) until all rows have been processed.

In "flat" mode:

  1. Load the python module defined in the transform.
  2. Assume that this code has top-level code that calls koza_app.get_row() repeatedly. Run this code repeatedly until one of the following expected exceptions is thrown: NextRowException (meaning that a source has been exhausted and we should move onto the next source), or StopIteration (meaning that all sources have been exhausted). Other exceptions are not expected and will raise an error and stop everything.
  3. If neither of those expected exceptions are thrown (for example, if you managed to not call koza_app.get_row() enough to exhaust all sources, run forever.

This is the main code that did that:

koza/src/koza/app.py

Lines 101 to 124 in 8a3bab9

if self.source.config.transform_mode == 'flat':
while True:
try:
if is_first:
transform_module = importlib.import_module(transform_code)
is_first = False
else:
importlib.reload(transform_module)
except MapItemException as mie:
if self.logger:
self.logger.debug(f"{str(mie)} not found in map")
except NextRowException:
continue
except ValidationError:
if self.logger:
self.logger.error(f"Validation error while processing: {self.source.last_row}")
raise ValidationError
except StopIteration:
break
elif self.source.config.transform_mode == 'loop':
if transform_code not in sys.modules.keys():
importlib.import_module(transform_code)
else:
importlib.reload(importlib.import_module(transform_code))

The way to run a transform in python was to run koza.cli_utils.transform_source:

def transform_source(
source: str,
output_dir: str,
output_format: OutputFormat = OutputFormat("tsv"),
global_table: str = None,
local_table: str = None,
schema: str = None,
node_type: str = None,
edge_type: str = None,
row_limit: int = None,
verbose: bool = None,
log: bool = False,
):
"""Create a KozaApp object, process maps, and run the transform
Args:
source (str): Path to source metadata file
output_dir (str): Path to output directory
output_format (OutputFormat, optional): Output format. Defaults to OutputFormat('tsv').
global_table (str, optional): Path to global translation table. Defaults to None.
local_table (str, optional): Path to local translation table. Defaults to None.
schema (str, optional): Path to schema file. Defaults to None.
row_limit (int, optional): Number of rows to process. Defaults to None.
verbose (bool, optional): Verbose logging. Defaults to None.
log (bool, optional): Log to file. Defaults to False.
"""
logger = get_logger(name=Path(source).name if log else None, verbose=verbose)
with open(source, "r") as source_fh:
source_config = PrimaryFileConfig(**yaml.load(source_fh, Loader=UniqueIncludeLoader))
# Set name and transform code if not provided
if not source_config.name:
source_config.name = Path(source).stem
if not source_config.transform_code:
filename = f"{Path(source).parent / Path(source).stem}.py"
if not Path(filename).exists():
filename = Path(source).parent / "transform.py"
if not Path(filename).exists():
raise FileNotFoundError(f"Could not find transform file for {source}")
source_config.transform_code = filename
koza_source = Source(source_config, row_limit)
logger.debug(f"Source created: {koza_source.config.name}")
translation_table = get_translation_table(
global_table if global_table else source_config.global_table,
local_table if local_table else source_config.local_table,
logger,
)
koza_app = _set_koza_app(
koza_source, translation_table, output_dir, output_format, schema, node_type, edge_type, logger
)
koza_app.process_maps()
koza_app.process_sources()

In the new API, there is a new class-- KozaRunner-- that takes care of loading a a configuration and kicking off the transform. Here is its __init__ function:

koza/src/koza/runner.py

Lines 146 to 155 in 4d5bc95

class KozaRunner:
def __init__(
self,
data: Iterator[Record],
writer: KozaWriter,
mapping_filenames: list[str] | None = None,
extra_transform_fields: dict[str, Any] | None = None,
transform_record: Callable[[KozaTransform, Record], None] | None = None,
transform: Callable[[KozaTransform], None] | None = None,
):

Note that this doesn't take any configuration file. The normal intended way to instantiate a KozaRunner object is with one of these class methods:

koza/src/koza/runner.py

Lines 257 to 264 in 4d5bc95

@classmethod
def from_config(
cls,
config: KozaConfig,
output_dir: str = "",
row_limit: int = 0,
show_progress: bool = False,
):

koza/src/koza/runner.py

Lines 311 to 320 in 4d5bc95

@classmethod
def from_config_file(
cls,
config_filename: str,
output_dir: str = "",
output_format: OutputFormat | None = None,
row_limit: int = 0,
show_progress: bool = False,
overrides: dict | None = None,
):

When a KozaRunner object is instantiated from a configuration, it looks for a transform:

koza/src/koza/runner.py

Lines 268 to 288 in 4d5bc95

if config.transform.code:
transform_code_path = Path(config.transform.code)
parent_path = transform_code_path.absolute().parent
module_name = transform_code_path.stem
logger.debug(f"Adding `{parent_path}` to system path to load transform module")
sys.path.append(str(parent_path))
# FIXME: Remove this from sys.path
elif config.transform.module:
module_name = config.transform.module
if module_name:
logger.debug(f"Loading module `{module_name}`")
transform_module = importlib.import_module(module_name)
transform = getattr(transform_module, "transform", None)
if transform:
logger.debug(f"Found transform function `{module_name}.transform`")
transform_record = getattr(transform_module, "transform_record", None)
if transform_record:
logger.debug(f"Found transform function `{module_name}.transform_record`")
source = Source(config, row_limit=row_limit, show_progress=show_progress)

(Note that it loads this transform module once, in line 280).

When the KozaRunner object is instantiated, you run the transform with... runner.run():

koza/src/koza/runner.py

Lines 197 to 207 in 4d5bc95

def run(self):
if callable(self.transform) and callable(self.transform_record):
raise ValueError("Can only define one of `transform` or `transform_record`")
elif callable(self.transform):
self.run_single()
elif callable(self.transform_record):
self.run_serial()
else:
raise NoTransformException("Must define one of `transform` or `transform_record`")
self.writer.finalize()

Which just delegates to either run_single (if you have defined a transform function in your module), or run_serial (if you defined transform_record). As mentioned before, transform is called once, whereas transform_record is called for every record defined in the source.

With all of this said, here is how you run Koza from Python with this new API:

from Koza import KozaConfig, KozaRunner

runner = KozaRunner.from_config(KozaConfig({
    "reader": { ... },
    "writer": { ... },
    "transform": { ... },
}))
runner.run()

Or, more likely, given a declarative YAML configuration:

from koza import KozaRunner

config, runner = KozaRunner.from_file("myconfig.yaml")
runner.run()

That's basically all the command line interface does (along with parsing options and so on). Long story short, I moved almost all the wiring into one class-- KozaRunner-- and removed a ton of plumbing logic away from the cli_utils module (which was confusingly part of the main Koza API).

(Side effect: this is vastly easier to test).

New configuration format

Speaking of configuration. Koza's configuration, which was a massive pile of top-level YAML options, has been changed to be a nested series of configurations for three sections: reader, writer, and transform. Here's a comparison, from examples/string-w-map/map-protein-links-detailed.yaml.

Old config:

name: 'map-protein-links-detailed'

delimiter: ' '

files:
  - './examples/data/string.tsv'
  - './examples/data/string2.tsv'

metadata: !include './examples/string-w-map/metadata.yaml'

columns:
  - 'protein1'
  - 'protein2'
  - 'neighborhood'
  - 'fusion'
  - 'cooccurence'
  - 'coexpression'
  - 'experimental'
  - 'database'
  - 'textmining'
  - 'combined_score' : 'int'

filters:
  - inclusion: 'include'
    column: 'combined_score'
    filter_code: 'lt'
    value: 700

depends_on:
  - './examples/maps/entrez-2-string.yaml'

transform_mode: 'flat'

node_properties:
  - 'id'
  - 'category'
  - 'provided_by'

edge_properties:
  - 'id'
  - 'subject'
  - 'predicate'
  - 'object'
  - 'category'
  - 'relation'
  - 'provided_by'

New config:

name: 'map-protein-links-detailed'

metadata: !include './examples/string-w-map/metadata.yaml'

reader:
  format: csv
  delimiter: ' '
  files:
    - './examples/data/string.tsv'
    - './examples/data/string2.tsv'

  columns:
    - 'protein1'
    - 'protein2'
    - 'neighborhood'
    - 'fusion'
    - 'cooccurence'
    - 'coexpression'
    - 'experimental'
    - 'database'
    - 'textmining'
    - 'combined_score' : 'int'

transform:
  filters:
    - inclusion: 'include'
      column: 'combined_score'
      filter_code: 'lt'
      value: 700
  mappings:
    - './examples/maps/entrez-2-string.yaml'

writer:
  node_properties:
    - 'id'
    - 'category'
    - 'provided_by'

  edge_properties:
    - 'id'
    - 'subject'
    - 'predicate'
    - 'object'
    - 'category'
    - 'relation'
    - 'provided_by'

This may seem like a small change, but compare the old Pydantic class for SourceConfig to the new separated config classes:

This makes it drastically easier to add/remove/change options. It also drastically simplifies how configurations are passed to readers and writers. Readers should and do only know about ReaderConfigs; writers should and do only know about WriterConfigs. For a dramatic example, check out CSVReader.

Old __init__:

def __init__(
self,
io_str: IO[str],
field_type_map: Dict[str, FieldType] = None,
delimiter: str = ",",
header: Union[int, HeaderMode] = HeaderMode.infer,
header_delimiter: str = None,
header_prefix: str = None,
dialect: str = "excel",
skip_blank_lines: bool = True,
name: str = "csv file",
comment_char: str = "#",
row_limit: int = None,
*args,
**kwargs,
):

And here's where it was created:

CSVReader(
resource_io,
name=config.name,
field_type_map=config.field_type_map,
delimiter=config.delimiter,
header=config.header,
header_delimiter=config.header_delimiter,
header_prefix=config.header_prefix,
comment_char=self.config.comment_char,
row_limit=self.row_limit,
)

And here's the __post_init__ method in the parent class that did a lot of validation for CSV configs. (To be clear, this was the configuration for everything, not just CSV readers):

def __post_init__(self):
# Get files as paths, or extract them from an archive
if self.file_archive:
files = self.extract_archive()
else:
files = self.files
files_as_paths: List[Path] = []
for file in files:
if isinstance(file, str):
files_as_paths.append(Path(file))
else:
files_as_paths.append(file)
object.__setattr__(self, "files", files_as_paths)
# If metadata looks like a file path attempt to load it from the yaml
if self.metadata and isinstance(self.metadata, str):
try:
with open(self.metadata, "r") as meta:
object.__setattr__(self, "metadata", DatasetDescription(**yaml.safe_load(meta)))
except Exception as e:
raise ValueError(f"Unable to load metadata from {self.metadata}: {e}")
# Format tab as delimiter
if self.delimiter in ["tab", "\\t"]:
object.__setattr__(self, "delimiter", "\t")
# Filter columns
filtered_columns = [column_filter.column for column_filter in self.filters]
all_columns = []
if self.columns:
all_columns = [next(iter(column)) if isinstance(column, Dict) else column for column in self.columns]
if self.header == HeaderMode.none and not self.columns:
raise ValueError(
"there is no header and columns have not been supplied\n"
"configure the 'columns' field or set header to the 0-based"
"index in which it appears in the file, or set this value to"
"'infer'"
)
for column in filtered_columns:
if column not in all_columns:
raise (ValueError(f"Filter column {column} not in column list"))
for column_filter in self.filters:
if column_filter.filter_code in ["lt", "gt", "lte", "gte"]:
if not isinstance(column_filter.value, (int, float)):
raise ValueError(f"Filter value must be int or float for operator {column_filter.filter_code}")
elif column_filter.filter_code == "eq":
if not isinstance(column_filter.value, (str, int, float)):
raise ValueError(
f"Filter value must be string, int or float for operator {column_filter.filter_code}"
)
elif column_filter.filter_code == "in":
if not isinstance(column_filter.value, List):
raise ValueError(f"Filter value must be List for operator {column_filter.filter_code}")
# Check for conflicting configurations
if self.format == FormatType.csv and self.required_properties:
raise ValueError(
"CSV specified but required properties have been configured\n"
"Either set format to jsonl or change properties to columns in the config"
)
if self.columns and self.format != FormatType.csv:
raise ValueError(
"Columns have been configured but format is not csv\n"
"Either set format to csv or change columns to properties in the config"
)
if self.json_path and self.format != FormatType.json:
raise ValueError(
"iterate_over has been configured but format is not json\n"
"Either set format to json or remove iterate_over in the configuration"
)
# Create a field_type_map if columns are supplied
if self.columns:
field_type_map = {}
for field in self.columns:
if isinstance(field, str):
field_type_map[field] = FieldType.str
else:
if len(field) != 1:
raise ValueError("Field type map contains more than one key")
for key, val in field.items():
field_type_map[key] = val
object.__setattr__(self, "field_type_map", field_type_map)

..And in the new API:

def __init__(
self,
io_str: IO[str],
config: CSVReaderConfig,
*args: Any,
**kwargs: Any,
):

Instantiation site:

CSVReader(
resource.reader,
config=reader_config,
)

Parent model __post_init__:

def __post_init__(self):
# If metadata looks like a file path attempt to load it from the yaml
if self.metadata and isinstance(self.metadata, str):
try:
with open(self.metadata) as meta:
object.__setattr__(self, "metadata", DatasetDescription(**yaml.safe_load(meta)))
except Exception as e:
raise ValueError(f"Unable to load metadata from {self.metadata}: {e}") from e
if self.reader.format == InputFormat.csv and self.reader.columns is not None:
filtered_columns = OrderedSet([column_filter.column for column_filter in self.transform.filters])
all_columns = OrderedSet(
[column if isinstance(column, str) else list(column.keys())[0] for column in self.reader.columns]
)
extra_filtered_columns = filtered_columns - all_columns
if extra_filtered_columns:
quote = "'"
raise ValueError(
"One or more filter columns not present in designated CSV columns:"
f" {', '.join([f'{quote}{c}{quote}' for c in extra_filtered_columns])}"
)

🎉

Want to write a new writer?

In the old code: add a bunch of options to the SourceConfig pydantic class in the top level, alongside header_mode, filters, transform_module, and the couple dozen other ones. Create a writer with an __init__ class that duplicates all the names and types of all those options you added to SourceConfig. When you create an instance of that class, duplicate those names a third time. Need to change or add an option? I hope you remembered to make your edits in all three of those places. (Also: you can't rely on Pydantic to mark any of your options as required, because those options will never have values when people are not using your writer. Better do a bunch of validation logic in the writer itself, which won't run until the transform starts running. Or maybe you could add to the mess of SourceConfig, adding more logic to its __post_init__).

In the new code: inherit from WriterConfig and extend it. Have your writer take a config: MyNewWriterConfig parameter. Test it! Good job.

Misc.

  • Better type annotations everywhere.
  • Drop support for Python <3.10, add support for 3.13
  • Add a bunch of tests. (Which in turn led to fixing a number of bugs).
  • Progress bars. Cool! Check out 144ebca.
  • Stream input from compressed files. (i.e. there's no need to extract them first).

Removals

  • No more translation tables (global or local). Use maps or SSSOM instead. (Need to expand on this)
  • No more linkml validation on every row (which, to be clear, caused huge overhead when enabled. Also I haven't seen people use it). Instead: transform, then validate (by just using linkml yourself). We could add this back in easily.
  • Got rid of lots of unused code. See df7baa2 626979f 645835c 2f3f4f0 e05c813

(...more to come...)

(note to self: no special mapping transform, extra fields in transforms, no difference between files and file_archive)

Patrick Golden added 27 commits January 10, 2025 13:01
Without making any changes to functionality, this separates a koza
configuration into a ReaderConfiguration, TransformConfiguration, and
WriterConfiguration, all contained within a KozaConfiguration.
The big changes are:

  1. Taking in a JSON{,L}ReaderConfig object for all configuration

  2. Defining iteration via `__iter__()` and `yield`
First, replaces the many named parameters with a single CSVReaderConfig
object.

Second, uses `__iter__()` and `yield` to define iteration.

Third, refactors the header consumption and validation code, and wraps
accessing the header in a property on the class.
This adds a new class: KozaRunner, which represents a new way of running
Koza transforms. It is a work in progress and still not at feature
parity with existing transforms.

Essentially, the KozaRunner class takes three parameters:

  1. Data (the data to be transformed)

  2. A function to transform that data, either all at once or row-by-row

  3. A writer that will do something with the transformed output

See the documentation in src/koza/runner.py for more details.
This commit makes multiple changes to koza.io.utils.open_resource

- Adds support for opening tar files.

- Handles archives (zip and tar) in the same way that the old
`file_archive` source configuration did: it assumes all files in an
archive are of the same format (CSV, JSONL, etc.). It will likely be
future work to allow a way to specify that only certain files in an
archive should be handled.

- Adds more robust checking for gzip compression than checking for a
`.gz` extension.

- open_resource() now returns one or more SizedResource objects that
indicate the size of the resource being opened, and a `.tell()` method
that indicates the position being read in that resource. This will be
necessary to add some sort of progress bar in the future.

- Resources downloaded from the Web now use the same logic as local
files to check for compression/archives.

- Importantly, the resources returned by `open_resource` *are not
automatically closed*. This was inconsistent in the previous version. It
is up to the consumer of the function to explicitly close resources.

- Adds more tests for compressed and archival formats.

- Small typing changes for other koza.io.utils functions, adding
Optional where appropriate
This was not working correctly with the discriminated union field
I realized at some point that creating a map from a reader file is just
a type of transform. This change in the configuration makes achieving
that possible.

A map transform is just a transform that relies on two additional
configuration keys: `key` and `values`. To make passing those values in
a YAML config possible, this commit makes it so that any extra fields in
the configuration are parsed into an `extra_fields` field in a
transform.
This makes config creation more lenient. Note that this means it's
possible to have an empty transform. The lack of a transform would be
detected when a KozaRunner is run.
Also remove unnecessary `files=[]` calls, since that is the default as
of eaff691.
This allows a transform to be defined as a module (resolvable from
PATH), e.g. `mypackage.transforms.example_transform`, rather than having
to defined it as a file (`/home/user/code/mypackage/transforms/example_transform.py`)

This allows the possibility of creating generic transforms that can be
packaged, installed, and re-used, without having to track down the
filename of the python file where the transform code is located.
This commit the builds on the changes in a60c607, bfa87d3, and eaff691.
It fully implements the mapping functionality that was present in the
previous method of writing transforms, although with a new API.

Instead of being given a large dict-of-dicts with mappings defined for
terms, a method is passed via the KozaTransform object used in a
transform, where a map lookup is done like so:

    def transform(koza: KozaTransform):
        term = "example"
        mapped_term = koza.lookup(term, "column_b")

...where the map was loaded from a CSV file that might look like this:

    id,column_a,column_b
    example,alias1,alias2

...resulting in mapped_term evaluating to `"alias2"`.
Patrick Golden and others added 24 commits January 28, 2025 13:10
* Use match statement for header detection logic

* Remove unused line_num and line_count variables
Formatting, renaming variables tests, removing unnecessary config params
* Use [project] instead of [tool.poetry]

* Set minimum python version to 3.10

* Use pyupgrade lint rules
I had switched to itertuples (from iterrows), but didn't change how rows
were interacted with.
@kevinschaper kevinschaper self-requested a review July 10, 2025 20:05
@kevinschaper
Copy link
Copy Markdown
Member

I'm going to go ahead and bring this into main, but not yet make a release yet.

@kevinschaper kevinschaper merged commit ab47894 into main Jul 10, 2025
4 checks passed
@kevinschaper kevinschaper deleted the koza-api-new branch July 10, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants