Skip to content

my-first-spec test #1

@PedroMTQ

Description

@PedroMTQ

I'm trying your framework, and I'm having some issues with the first spec test from https://kanoniv.com/docs/getting-started/first-reconciliation.html

I've setup the spec as instructed and ran the test.py, but I'm getting a merge rate of 0%.

The spec:

api_version: kanoniv/v1
identity_version: "1.0"

entity:
  name: customer

sources:
  crm:
    adapter: csv
    location: crm.csv
    primary_key: id
    schema:
      name: { type: string }
      email: { type: string, pii: true }
      phone: { type: string, pii: true }
      company: { type: string }

  billing:
    adapter: csv
    location: billing.csv
    primary_key: customer_id
    schema:
      full_name: { type: string }
      email: { type: string, pii: true }
      card_last4: { type: string }
      plan: { type: string }

rules:
  - name: email_exact
    type: exact
    field: email
    weight: 1.0

decision:
  thresholds:
    match: 0.9
    review: 0.5

survivorship:
  strategy: source_priority
  source_order: [crm, billing]

Code:

from kanoniv import Spec, validate
from kanoniv import Source, reconcile
from kanoniv import plan

spec = Spec.from_file('./tests/kanoniv/my-first-spec.yaml')
result = validate(spec)

if result:
    print("Spec is valid!")
else:
    for error in result.errors:
        print(f"  {error}")
    raise SystemExit(1)


execution_plan = plan(spec)
print(execution_plan.summary())


sources = [
    Source.from_csv("crm", './tests/kanoniv/crm.csv'),
    Source.from_csv("billing", './tests/kanoniv/billing.csv'),
]

result = reconcile(sources, spec)

for i, cluster in enumerate(result.clusters):
    print(f"Cluster {i + 1}: {cluster}")

for record in result.golden_records:
    print(record)

for decision in result.decisions:
    print(decision)
print(f"Clusters: {result.cluster_count}")
print(f"Golden records: {len(result.golden_records)}")
print(f"Merge rate: {result.merge_rate:.1%}")

df = result.to_pandas()

print(df)

Console output:

Spec is valid!
  Identity:     customer (1.0)
  Sources:      0 ()
  Signals:      email (exact, w=1)
  Blocking:     none
  Thresholds:   merge >= 0.9, review >= 0.5
  Stages:       8 execution stages
  Survivorship: 0 fields configured
  Risk flags:   1 critical, 1 high, 1 medium
  Plan hash:    sha256:bb0674d0...
Cluster 1: ['39cecfed-2dd5-41b3-b286-b80eb7c173cc']
Cluster 2: ['41591635-f3e6-48dc-925a-7b6d10dfe8de']
Cluster 3: ['49adf63c-a7b7-411f-b9b6-42dc2fb57c90']
Cluster 4: ['77daae89-d0e8-41be-ae26-d2e3d6ac8c51']
Cluster 5: ['7cb61d5e-b6e8-40ee-93ba-60018b71b7d3']
Cluster 6: ['8cff1e3a-46a2-4682-9c4c-1c1f0173a2ce']
Cluster 7: ['a27478e5-00ed-4ce5-bc35-d0c09f062e75']
Cluster 8: ['bd9c2f0f-65ab-4d18-ba67-2e60c38c3e24']
Cluster 9: ['c0096fb7-49f3-4aae-b769-ddb25fefcb0f']
Cluster 10: ['e85a4d9c-deb6-43b1-9ed3-dcf6eba63072']
{'company': 'BigCo', 'email': 'charlie@bigco.com', 'id': '5', 'kanoniv_id': 'knv_d62c42882874', 'member_count': '1', 'name': 'Charlie Davis', 'phone': ''}
{'card_last4': '9999', 'customer_id': 'cus_005', 'email': 'charlie@bigco.com', 'full_name': 'Charlie D.', 'kanoniv_id': 'knv_533fb1d53457', 'member_count': '1', 'plan': 'enterprise'}
{'card_last4': '3333', 'customer_id': 'cus_006', 'email': 'eve@newco.com', 'full_name': 'Eve Martinez', 'kanoniv_id': 'knv_711ce874e55b', 'member_count': '1', 'plan': 'pro'}
{'company': 'Acme Corp', 'email': 'john@acme.com', 'id': '1', 'kanoniv_id': 'knv_4bfad115cde1', 'member_count': '1', 'name': 'John Doe', 'phone': '555-0101'}
{'card_last4': '5678', 'customer_id': 'cus_003', 'email': 'bob@acme.com', 'full_name': 'Robert Wilson', 'kanoniv_id': 'knv_6ab33f2ec631', 'member_count': '1', 'plan': 'starter'}
{'card_last4': '4242', 'customer_id': 'cus_001', 'email': 'john@acme.com', 'full_name': 'Jonathan Doe', 'kanoniv_id': 'knv_d644027fdb84', 'member_count': '1', 'plan': 'enterprise'}
{'company': 'Globex Inc', 'email': 'jane.smith@globex.com', 'id': '2', 'kanoniv_id': 'knv_8e748398dfbc', 'member_count': '1', 'name': 'Jane Smith', 'phone': '555-0102'}
{'card_last4': '1234', 'customer_id': 'cus_002', 'email': 'jane.smith@globex.com', 'full_name': 'Jane Smith', 'kanoniv_id': 'knv_c9f45818cd98', 'member_count': '1', 'plan': 'pro'}
{'company': 'Acme Corp', 'email': 'bob@acme.com', 'id': '3', 'kanoniv_id': 'knv_d55332253457', 'member_count': '1', 'name': 'Bob Wilson', 'phone': '555-0103'}
{'company': 'Startup LLC', 'email': 'alice@startup.io', 'id': '4', 'kanoniv_id': 'knv_67da5dddd97e', 'member_count': '1', 'name': 'Alice Brown', 'phone': '555-0104'}
Clusters: 10
Golden records: 10
Merge rate: 0.0%
       company                  email   id        kanoniv_id member_count           name     phone card_last4 customer_id      full_name        plan
0        BigCo      charlie@bigco.com    5  knv_d62c42882874            1  Charlie Davis                  NaN         NaN            NaN         NaN
1          NaN      charlie@bigco.com  NaN  knv_533fb1d53457            1            NaN       NaN       9999     cus_005     Charlie D.  enterprise
2          NaN          eve@newco.com  NaN  knv_711ce874e55b            1            NaN       NaN       3333     cus_006   Eve Martinez         pro
3    Acme Corp          john@acme.com    1  knv_4bfad115cde1            1       John Doe  555-0101        NaN         NaN            NaN         NaN
4          NaN           bob@acme.com  NaN  knv_6ab33f2ec631            1            NaN       NaN       5678     cus_003  Robert Wilson     starter
5          NaN          john@acme.com  NaN  knv_d644027fdb84            1            NaN       NaN       4242     cus_001   Jonathan Doe  enterprise
6   Globex Inc  jane.smith@globex.com    2  knv_8e748398dfbc            1     Jane Smith  555-0102        NaN         NaN            NaN         NaN
7          NaN  jane.smith@globex.com  NaN  knv_c9f45818cd98            1            NaN       NaN       1234     cus_002     Jane Smith         pro
8    Acme Corp           bob@acme.com    3  knv_d55332253457            1     Bob Wilson  555-0103        NaN         NaN            NaN         NaN
9  Startup LLC       alice@startup.io    4  knv_67da5dddd97e            1    Alice Brown  555-0104        NaN         NaN            NaN         NaN

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions