Deduplication / uniqueness not supported? Merging based on custom logic seems required for cross-device sync. #117

rkhamilton · 2025-08-11T15:22:00Z

rkhamilton
Aug 11, 2025

When syncing between devices on the same Apple ID, business logic may define data that should be unique based on some field(s), but CloudKit does not support uniqueness constraints. As such, duplicate data may be uploaded to CloudKit from each device where the duplicates must be later merged using custom logic. In this scenario, we are not updating an entity where the last update wins, but creating a new entity which should be merged into an existing entity, preserving the relationships that existed on the original entities. For example:

The classic example from Apple's demo code is Tags. A tag A is added to a photo on a phone. Later an iPad is opened, and before it syncs to CloudKit, another tag A with the same name is added to another photo. Tags should be unique based on name, and so both tags should be merged into the same entity. Any tagged photo should still be tagged with the merged tag A, regardless of which of the two duplicates it was originally tagged with.
A budgeting app downloads transactions from their bank on a device before it has synced to CloudKit, downloading the same transactions that another device has downloaded. These transactions have the same unique id from the bank, and should be merged.
An app seeds the database with initial data on first launch. If a second device does not first sync to CloudKit, it may seed the same data twice. On sync, the data sets should be merged based on an appropriate field(s).

I have never worked on an app that had cross device sync where data deduplication wasn't a problem that needed to be solved. It's mostly about syncing globally unique strings, like tags on photos, or we are downloading data from an API that is processed and synced to CloudKit, and multiple devices will fetch the same data if they don't know another device has already fetched it.

Existing solutions

I'm aware of three sample Core Data projects from Apple that solve this problem, starting with WWDC 2012. The overall flow uses persistent history tracking in this process:

Tag transactions with a "transaction author", e.g. the name of the app.
When notified of a remote change, fetch all transactions that are a) newer that the last history fetch and b) without a transaction author == "APP_NAME". Transactions without a source name will have come from CloudKit, allowing you to fetch only cloud transactions. Filter this list of CloudKit transactions to the list of transactions involving entities that need to be deduplicated, e.g. Tags.
Given a list of Tags transactions from CloudKit, process them to remove duplicates using custom logic (e.g. name fields are equal, or transaction id matches) as compared to the current on-device data.

Apple has provided two more modern sample code projects demonstrating the solution in detail:

The Apple sample project Synchronizing a local store to the cloud has Tags that are deduplicated based on name. There is an edge case bug in this code where relationships may be lost when deduplicated, but the code is simpler to understand.

The problem of lost references is solved in their newest (iOS 17.4), more complex example project Sharing Core Data objects between iCloud users. This example also removes duplicate Tags based on name, but uses somewhat more complex logic to avoid the relationship-deletion problem in the earlier example. In this project, see Persistence/PersistenceController+Deduplicate.swift. The solution is essentially to flag entities as "to be deduplicated" and remove them after a delay. The file referenced above includes a comment that explains their process.

Swift Data

Swift Data was launched without support for Persistent History Tracking, and this made it impossible to deduplicate efficiently. Persistent History Tracking was added to Swift Data in iOS 18, but it does not seem to be full-featured enough to solve the problem correctly. I admit to not having looked closely at the persistent history tracking feature in Swift Data so perhaps it is now possible.

Conclusion

This may just be a reframing of the problem that CloudKit doesn't support @unique attributes, but it is a problem that must be solved for cross device sync. Apple chose to solve the lack of uniqueness constraints in CloudKit using complex on-device merge logic.

I've looked through the sharing-grdb example projects and documentation and it doesn't seem possible in the current beta to either enforce uniqueness during upload, or to merge based on properties when downloading changes. What are your thoughts on how this uniqueness/ deduplication problem can might addressed? Perhaps I am missing some functionality.

mbrandonw · 2025-08-11T16:02:32Z

mbrandonw
Aug 11, 2025
Maintainer

Hi @rkhamilton, we have thought about adding a customizable uniqueness conflict handler to the library so that when we are inserting data from CK into the user's local database and SQLite emits SQLITE_CONSTRAINT_UNIQUE, we invoke the custom handler and let the user of the library do whatever custom logic they want. The main reason we have held off on this is because we are looking for good use cases to explore so that we can make sure this tool is useful.

But, as it turns out, we feel there are alternative ways to handle the 3 use cases you gave above that do not require any additional infrastructure from the library. I'll explain each of them individually:

Tags should be unique based on name, and so both tags should be merged into the same entity.

If you were to model your tags table like so:

CREATE TABLE "tags" (
  "title" TEXT PRIMARY KEY NOT NULL,
  …
)

…then you would get conflict resolution for free from our library. When synchronizing, we always upsert data based on primary key (which is the title of the tag) and conflicting records are merged using a last-edit-wins strategy on a per-field basis. You can even add "COLLATE NOCASE" if you want case-insensitivity in the uniqueness.

And so if an iPhone created a photo with tag "Family" and an iPad created a different photo with tag "Family", then when the devices synchronize it should work just fine. Both new photos will be properly tagged with the unique "Family" tag.

Now using something like tag title as a primary key does come with new challenges because you may want to allow the user to edit the tag, but SQLite deals with them well. When creating foreign keys to the "tags" table you will want to make sure to use "ON UPDATE CASCADE" so that editing the title of a tag will update all of its foreign keys.

A budgeting app downloads transactions from their bank on a device before it has synced to CloudKit, downloading the same transactions that another device has downloaded. These transactions have the same unique id from the bank, and should be merged.

I believe this all works out of the box right now. The unique ID from the bank should be the primary key of the table, and as I mentioned above, primary keys are treated specially in the library. When records are sync'd from CK we perform an upsert based on the PK and then resolve conflicts on a per-field basis. There will be no duplicated data in this situation.

An app seeds the database with initial data on first launch. If a second device does not first sync to CloudKit, it may seed the same data twice. On sync, the data sets should be merged based on an appropriate field(s).

This too should already work just fine, as long as your seeded data has stable primary keys. That is, the data seeded from device A generates the same primary keys as device B. If that is upheld, then again there will be no duplicated data even when multiple devices sync the same seeded data.

So, as far as I can tell, the main situations you are concerned with are already handled by the library. If there are other situations in which non-primary key fields need to have uniqueness constraints, then we may consider adding that constraint failure handler I mentioned at the beginning of this post, but we first need a rock solid use case to test it out in full.

We can also update our reminders demo app to make it so that the tags table has its title as the primary key. That will show how this can work in practice.

But we do actually already have one example of the techniques I described above. Our reminders app allows one to associate cover images to lists. We store the image data in a separate table outside of the "remindersLists" table, and that table uses "remindersLists"."id" as both its primary key and a foreign key:

https://github.com/pointfreeco/sharing-grdb/blob/29930e7f74e2330118e32a78d44c48a8f40266b8/Examples/Reminders/Schema.swift#L157-L161

So that naturally gives us a uniqueness constraint that ensures there is at most one image associated with a reminders list, and if two devices create images at the same time, the records will be merged on a per-field basis.

0 replies

rkhamilton · 2025-08-11T16:34:55Z

rkhamilton
Aug 11, 2025
Author

Hi @mbrandonw thank you for the thoughtful response. I can see that my thinking on this topic is heavily shaped by thinking in terms of core data object instances where each is created with a unique persistent identifier rather than database rows. Your explanations make sense to me.

I had reviewed your example apps and didn't see a way to generate an upsert / deduplication scenario, which contributed to my thinking it wasn't supported. I think it would be helpful to also make it possible to add Tags to the Reminders app in addition to the primary key change (right now they are read-only with no UI to add or edit).

As the documentation is developed I think it would also be helpful to explicitly discuss this topic, as it's a non-trivial consideration for people coming from Core Data / Swift Data, and it may be a non-issue with sharing-grdb. That's a nice benefit of the library!

2 replies

mbrandonw Aug 11, 2025
Maintainer

Hi @rkhamilton, thanks that is all good feedback. We definitely hope to improve the docs and examples to properly show what is possible.

mbrandonw Aug 11, 2025
Maintainer

Just opened this PR to show how to deal with unique tag titles in the reminders app: https://github.com/pointfreeco/sharing-grdb/pull/119

Let us know if you have any questions or any further things to discuss.

rkhamilton · 2025-08-12T11:55:27Z

rkhamilton
Aug 12, 2025
Author

Hi @mbrandonw I realized last night that I have a real world example that is not transparently solved by the behavior you describe because it requires uniqueness on both a property and a relationship. I made a Swift Data app for the iOS 17 launch that is a kind of weather journal. Users identify locations on a map, and the app uses a weather API to download daily weather summaries, which are persisted and summarized. As a toy example:

@Model
    public final class Location {
        public var id: UUID = UUID()
        
       @Relationship(deleteRule: .cascade, inverse: \WeatherRecord.location) 
        public var weatherRecords: [WeatherRecord]? = []
}

@Model
    public final class WeatherRecord: Identifiable {
        public var id: UUID = UUID()
        public var date: Date = Date.now
        public var dailyHighTemperature: Double? // data from the weather API
        
        @Relationship(deleteRule: .nullify)
        public var location: Location?
}

The business logic is that we are building a calendar of historical data, with one WeatherRecord per day, per location. So the uniqueness constraint for a WeatherRecord table is that they must be unique for both date and Location. The lack of good deduplication support in Swift Data prevented me from ever enabling CloudKit support in this app because each device would download and persist the same weather API data for each location.

I could imaging that I would be able to use your existing uniqueness functionality to solve this by constructing a primary key for WeatherRecord that is composed of its owning Location's UUID plus its own date. Something like "(self.location.id.uuidString)+(self.date.timeIntervalSinceReferenceDate)" which would be unique for the WeatherRecord table. Does this seem like the right way to solve this problem using sharing-grdb? I'm not a database person, so I don't know if primary key solutions like this are normal or if that smells strange to you.

1 reply

mbrandonw Aug 12, 2025
Maintainer

Hi @rkhamilton, thanks for this example. This is a good use case for uniqueness constraints that are not naturally primary keys, and it's similar to a use case that @dave256 has brought up before that I forgot about (in his case it was a uniqueness constraint on a course ID and a class date).

Here’s one way you can enforce this uniqueness constraint without any new tools from us. You can create a trigger on your database that listens for inserts into the WeatherRecord table and detects when a new row is inserted that has a duplicate (locationID, date) pair, and then chooses to delete one of the duplicates. However, you must delete a row in such a way that any device that detects a uniqueness constraint failure will delete the same row. One way to do this might be to just delete the one with the smallest id. Here’s how you can do that:

WeatherRecord.createTemporaryTrigger(
  after: .insert { new in 
		WeatherRecord
	    .delete()
	    .where {
	      $0.locationID.eq(new.locationID)
	        && $0.date.eq(new.date)
	    }
	    .order(by: \.id)
	    .limit(1)
  } when: { new in 
    WeatherRecord.where {
      $0.locationID.eq(new.locationID)
        && $0.date.eq(new.date)
    }
    .count() > 1
  }
)

I believe that will mostly work, but there may be some edge cases to think through.

A better way to do this, and something we do want to support in the future but haven’t yet cracked it, is to allow for compound primary keys. That would allow you to combine the location ID and date into a single ID struct to serve as the primary key:

@Table
struct WeatherRecord {
  let id: ID
  var dailyHighTemperature: Double?

  struct ID {
    var date: Date
    var locationID: Location.ID
  }
}

…and then you would get the uniqueness constraint for free.

We definitely want this to be possible eventually, but we don’t have a timeline for it yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deduplication / uniqueness not supported? Merging based on custom logic seems required for cross-device sync. #117

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Deduplication / uniqueness not supported? Merging based on custom logic seems required for cross-device sync. #117

Uh oh!

rkhamilton Aug 11, 2025

Existing solutions

Swift Data

Conclusion

Replies: 3 comments · 3 replies

Uh oh!

mbrandonw Aug 11, 2025 Maintainer

Uh oh!

rkhamilton Aug 11, 2025 Author

Uh oh!

mbrandonw Aug 11, 2025 Maintainer

Uh oh!

mbrandonw Aug 11, 2025 Maintainer

Uh oh!

Uh oh!

rkhamilton Aug 12, 2025 Author

Uh oh!

mbrandonw Aug 12, 2025 Maintainer

rkhamilton
Aug 11, 2025

Replies: 3 comments 3 replies

mbrandonw
Aug 11, 2025
Maintainer

rkhamilton
Aug 11, 2025
Author

mbrandonw Aug 11, 2025
Maintainer

mbrandonw Aug 11, 2025
Maintainer

rkhamilton
Aug 12, 2025
Author

mbrandonw Aug 12, 2025
Maintainer