Core Data + CloudKit – Syncing Existing Data

Anyone who has an app that uses both Core Data and CloudKit to sync the data across devices will understand why NSPersistentCloudKitContainer just might be a sleeper hit of iOS 13.  For Money Master, I was able to remove almost 1000 lines of my most complex code, and I have more reliable sync as a result!  It’s not all roses, though – if you already have users with data in the field, you may have noticed that the data isn’t pushed to the new CloudKit Zone, while new data is – this is apparently an intentional design choice, although Apple doesn’t provide any guidance, nor do the new API’s make it obvious how this should be done.  With a little experimentation, I’ve found that the following solution seems to be a way forward.

Duplication, Duplication, Duplic…

My first pass, suggested by Apple, was to simply change my NSPersistentContainer to NSPersistentCloudKitContainer.  This allowed new data to sync over, but the existing data didn’t go anywhere – if I modified any of the existing records, however, those sync’d just fine.  Sweet – just do this to all records on the first start, and we’ll be good to go!  Sadly, my joy was short lived, as this caused a full copy of the data to be synced for each device – not so good…

Taking a step back, this makes sense – when I built iCloud support into Money Master, I had the exact same problem.  I needed to implement deduplication myself, by adding a unique UUID to each record.  Core Data, of course, knows about this property, but it has no idea what it’s used for.  Each device essentially has a separate, independent database, that just happens to have the same data in each – the deduplication logic is implemented in my code!

It would have been awesome for Apple to provide a way to mark a property on each record to use as the CKRecord ID – seems to me that this would have been an elegant solution, but alas, it’s not available.

A Tale of Two Persistent Containers

Instead of modifying my existing Persistence Container, I decided to create a second one – this would allow me to use the original container to store certain configuration data that was best left on device, while also allowing me to pick and chose what records I copied into the new container.  My Core Data Stack looks something like this:

//Original Local Container
lazy var localContainer: NSPersistentContainer = {
  let container = NSPersistentContainer(name: "MyCoreDataModel")
  container.loadPeristentStores(completionHandler: { (storeDescription, error) in 
    if let error = error as NSError? {
      fatalError("Unresolved error \(error), \(error.userInfo)")
    }
  })
}()

//New Mirrored Container
lazy var mirroredContainer: NSPersistentCloudKitContainer = {
  let container = NSPersistentCloudKitContainer(name: "MyCoreDataModel")
  let storeDirectory = FileManager.default.urls(for: .applicationSupportDirectory, in: .userDomainMask).first!

  let cloudStoreLocation = storeDirectory.appendingPathComponent("MirredData.sqlite")
  let cloudStoreDescription = NSPersistentStoreDescription(url: cloudStoreLocation)

  //"Cloud" is the name of a second configuration in my Core Data Model
  cloudStoreDescription.configuration = "Cloud"

  // Set the container options on the cloud store
  cloudStoreDescription.cloudKitContainerOptions = NSPersistentCloudKitContainerOptions(containerIdentifier: "iCloud.")
  
  container.persistentStoreDescriptions = [
    cloudStoreDescription
  ]
  container.loadPersistentStores { (storeDescription, error) in 
    guard error == nil else {
      fatalError("Could not load persistent stores \(error!)")
    }
  }
}()

And what about those dupes?

Of course, that deduplication UUID is still there, so why not give it a second life?  But how do we know what data is where?  The new NSPersistentCloudKitContainer is designed as an ‘eventually consistent’ distributed system, and as such, it doesn’t provide a way to ask it if you have all of the data.  After many (many!) different attempts, I realized the answer was much simpler than adapting the new API to a use it’s not designed for — CloudKit itself hasn’t gone away, and can be used to inspect the new containers!

From here, the way forward was fairly simple – load the data from CloudKit (in batches, if you need), and load data from the same batch from my original PersistenceContainer.  For every local record, if there is no associated record with the same UUID from CloudKit, that means it hasn’t been sync’d, and I can store that same record in my new and improved Mirrored Persistent Container — if it does have a match, I can leave it where it is.  And there we have deduplication!

func migrate() {
  let privateDB = CKContainer.default().privateCloudDatabase
  let query = CKQuery(recordType: "CD_RecordName", predicate: NSPredicate(value: true)) //Core Data appends 'CD_' to the name
  let zoneId = CKRecordZone.ID(zoneName: "com.apple.coredata.cloudkit.zone", ownerName: CKCurrentUserDefaultName) //Yep, Core Data creates it's own CloudKit Zone

  privateDB.perform(query, inZoneWith: zoneId, completionHandler: { (cloudRecords, error) in 
    if let error = error {
      ....
    }

    var cloudUUIDs = cloudRecords.map { record in 
      return UUID(uuidString: record["CD_uuid"]!.description)! //Another 'CD_' prefix...
    }

    let localContainer = CoreDataStack.instance.localContainer
    let fetchRequest = NSFetchRequest(entityName: "RecordName") //Same as 'CD_RecordName' above, but without the 'CD_'

    let context = localContainer.newBackgroundContext()
    context.performAndWait {
      do {
        let records = try context.fetch(fetchRequest) as! [RecordName]

        records.forEach { record in 
          if cloudUUID.filter { $0 == record.uuid }.count == 0 {
            recordRepository.create(record) //This has been updated to use the Mirrored Persistent Context
          }
        }
      } catch {
        fatalError("blah, blah, blah...")
      }
    }
  }
}

Your Mileage May Vary

A few random points:

  • Make sure you only run this migration once – I use UserPreferences to track this sort of thing, and I only mark it as ‘complete’ if it failed without a network error
  • In order to query the new records in CloudKit, you’ll need to pre-create the ‘CD_RecordName’ type in CloudKit, and assign a Queryable Index on the recordName property.  This is standard procedure for any CloudKit record, if not a bit annoying.  Luckily, you don’t need to add any other properties on it — Core Data will handle the rest.
  • I used pseudo-code here because my migration may not be applicable across the board.  In Money Master it’s likely that you’ll have up to a few hundred records — if your app has thousands, you’ll need to be much more careful about the data you’re copying down to the devices.

One More Thing…?

One last tip — if your previous CloudKit integration had a CKSubscription created named ‘cloud’… well, that’s going to cause some problems.  You’ll get errors from the Core Data infrastructure, and you won’t see any data sync’d to the new iCloud Zone.  The solution to this is simple, just delete the existing subscription before creating the NSPersistentCloudKitContainer.  It just seems like Apple could have chosen a subscription name here that’s slightly less likely to have this kind of collision…

Leave a comment