-
Notifications
You must be signed in to change notification settings - Fork 1
my-first-spec test #1
Copy link
Copy link
Open
Description
I'm trying your framework, and I'm having some issues with the first spec test from https://kanoniv.com/docs/getting-started/first-reconciliation.html
I've setup the spec as instructed and ran the test.py, but I'm getting a merge rate of 0%.
The spec:
api_version: kanoniv/v1
identity_version: "1.0"
entity:
name: customer
sources:
crm:
adapter: csv
location: crm.csv
primary_key: id
schema:
name: { type: string }
email: { type: string, pii: true }
phone: { type: string, pii: true }
company: { type: string }
billing:
adapter: csv
location: billing.csv
primary_key: customer_id
schema:
full_name: { type: string }
email: { type: string, pii: true }
card_last4: { type: string }
plan: { type: string }
rules:
- name: email_exact
type: exact
field: email
weight: 1.0
decision:
thresholds:
match: 0.9
review: 0.5
survivorship:
strategy: source_priority
source_order: [crm, billing]
Code:
from kanoniv import Spec, validate
from kanoniv import Source, reconcile
from kanoniv import plan
spec = Spec.from_file('./tests/kanoniv/my-first-spec.yaml')
result = validate(spec)
if result:
print("Spec is valid!")
else:
for error in result.errors:
print(f" {error}")
raise SystemExit(1)
execution_plan = plan(spec)
print(execution_plan.summary())
sources = [
Source.from_csv("crm", './tests/kanoniv/crm.csv'),
Source.from_csv("billing", './tests/kanoniv/billing.csv'),
]
result = reconcile(sources, spec)
for i, cluster in enumerate(result.clusters):
print(f"Cluster {i + 1}: {cluster}")
for record in result.golden_records:
print(record)
for decision in result.decisions:
print(decision)
print(f"Clusters: {result.cluster_count}")
print(f"Golden records: {len(result.golden_records)}")
print(f"Merge rate: {result.merge_rate:.1%}")
df = result.to_pandas()
print(df)
Console output:
Spec is valid!
Identity: customer (1.0)
Sources: 0 ()
Signals: email (exact, w=1)
Blocking: none
Thresholds: merge >= 0.9, review >= 0.5
Stages: 8 execution stages
Survivorship: 0 fields configured
Risk flags: 1 critical, 1 high, 1 medium
Plan hash: sha256:bb0674d0...
Cluster 1: ['39cecfed-2dd5-41b3-b286-b80eb7c173cc']
Cluster 2: ['41591635-f3e6-48dc-925a-7b6d10dfe8de']
Cluster 3: ['49adf63c-a7b7-411f-b9b6-42dc2fb57c90']
Cluster 4: ['77daae89-d0e8-41be-ae26-d2e3d6ac8c51']
Cluster 5: ['7cb61d5e-b6e8-40ee-93ba-60018b71b7d3']
Cluster 6: ['8cff1e3a-46a2-4682-9c4c-1c1f0173a2ce']
Cluster 7: ['a27478e5-00ed-4ce5-bc35-d0c09f062e75']
Cluster 8: ['bd9c2f0f-65ab-4d18-ba67-2e60c38c3e24']
Cluster 9: ['c0096fb7-49f3-4aae-b769-ddb25fefcb0f']
Cluster 10: ['e85a4d9c-deb6-43b1-9ed3-dcf6eba63072']
{'company': 'BigCo', 'email': 'charlie@bigco.com', 'id': '5', 'kanoniv_id': 'knv_d62c42882874', 'member_count': '1', 'name': 'Charlie Davis', 'phone': ''}
{'card_last4': '9999', 'customer_id': 'cus_005', 'email': 'charlie@bigco.com', 'full_name': 'Charlie D.', 'kanoniv_id': 'knv_533fb1d53457', 'member_count': '1', 'plan': 'enterprise'}
{'card_last4': '3333', 'customer_id': 'cus_006', 'email': 'eve@newco.com', 'full_name': 'Eve Martinez', 'kanoniv_id': 'knv_711ce874e55b', 'member_count': '1', 'plan': 'pro'}
{'company': 'Acme Corp', 'email': 'john@acme.com', 'id': '1', 'kanoniv_id': 'knv_4bfad115cde1', 'member_count': '1', 'name': 'John Doe', 'phone': '555-0101'}
{'card_last4': '5678', 'customer_id': 'cus_003', 'email': 'bob@acme.com', 'full_name': 'Robert Wilson', 'kanoniv_id': 'knv_6ab33f2ec631', 'member_count': '1', 'plan': 'starter'}
{'card_last4': '4242', 'customer_id': 'cus_001', 'email': 'john@acme.com', 'full_name': 'Jonathan Doe', 'kanoniv_id': 'knv_d644027fdb84', 'member_count': '1', 'plan': 'enterprise'}
{'company': 'Globex Inc', 'email': 'jane.smith@globex.com', 'id': '2', 'kanoniv_id': 'knv_8e748398dfbc', 'member_count': '1', 'name': 'Jane Smith', 'phone': '555-0102'}
{'card_last4': '1234', 'customer_id': 'cus_002', 'email': 'jane.smith@globex.com', 'full_name': 'Jane Smith', 'kanoniv_id': 'knv_c9f45818cd98', 'member_count': '1', 'plan': 'pro'}
{'company': 'Acme Corp', 'email': 'bob@acme.com', 'id': '3', 'kanoniv_id': 'knv_d55332253457', 'member_count': '1', 'name': 'Bob Wilson', 'phone': '555-0103'}
{'company': 'Startup LLC', 'email': 'alice@startup.io', 'id': '4', 'kanoniv_id': 'knv_67da5dddd97e', 'member_count': '1', 'name': 'Alice Brown', 'phone': '555-0104'}
Clusters: 10
Golden records: 10
Merge rate: 0.0%
company email id kanoniv_id member_count name phone card_last4 customer_id full_name plan
0 BigCo charlie@bigco.com 5 knv_d62c42882874 1 Charlie Davis NaN NaN NaN NaN
1 NaN charlie@bigco.com NaN knv_533fb1d53457 1 NaN NaN 9999 cus_005 Charlie D. enterprise
2 NaN eve@newco.com NaN knv_711ce874e55b 1 NaN NaN 3333 cus_006 Eve Martinez pro
3 Acme Corp john@acme.com 1 knv_4bfad115cde1 1 John Doe 555-0101 NaN NaN NaN NaN
4 NaN bob@acme.com NaN knv_6ab33f2ec631 1 NaN NaN 5678 cus_003 Robert Wilson starter
5 NaN john@acme.com NaN knv_d644027fdb84 1 NaN NaN 4242 cus_001 Jonathan Doe enterprise
6 Globex Inc jane.smith@globex.com 2 knv_8e748398dfbc 1 Jane Smith 555-0102 NaN NaN NaN NaN
7 NaN jane.smith@globex.com NaN knv_c9f45818cd98 1 NaN NaN 1234 cus_002 Jane Smith pro
8 Acme Corp bob@acme.com 3 knv_d55332253457 1 Bob Wilson 555-0103 NaN NaN NaN NaN
9 Startup LLC alice@startup.io 4 knv_67da5dddd97e 1 Alice Brown 555-0104 NaN NaN NaN NaN
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels