Data Ingest

Any object that has versioning capabilities also supports data ingest. Skylark allows these objects to be referred to by their identifiers from external third-party systems, and can keep fieldsets in sync with changes there.

Use Cases

  • Data Ingest functionality in Skylark helps solve the following problems:
  • Use data in a media asset management or scheduling system to populate Skylark content or video playback data
  • Import content from multiple databases to a single instance of Skylark
  • Import editorial data from a third-party CMS

Technical Description

Data Ingest functionality in Skylark is closely related to its versioning capability.

1520

As well as a history of changes for each language and for non-translatable data, Skylark can maintain a version of each object that represents its state in an upstream system. No historical versions are stored for upstream data.

Skylark also provides APIs to work directly with data source versions. Customised Skylark installations may include additional functionality to interface with third-party systems that make use of these APIs.

Unique Identifier

Skylark uids are not used to interact with data source versions. Instead, data_source_ids are used. These are provided by the upstream system and can be a string of up to 255 characters, which must be unique within each Skylark entity type.

Creating or Updating Skylark Entities

When a PUT request is made to the a data ingest version URL, a Skylark object may be created or an existing object updated.

If an object with the given data_source_id does not exist, one is created, using the given data to construct the initial fieldsets. The initial versioned fieldsets in the local source are created at the same time, using the same data.

If an object with the given data source id already exists in Skylark, then the existing fieldsets in the upstream data store are updated. If a language has been specified in the request, and no translatable fieldset for the object exists yet in the local source, the new language will be stored in both the local and upstream stores.

Skylark never stores an upstream fieldset with no equivalent local fieldset.

Syncing the Data Source and Local Versions

Fields in the local store (or a subset of fields) can be synced automatically with their upstream equivalent to keep the current local version (returned by the standard APIs) of the object up to date with changes made in the external data source.

The list of fields to be kept synchronised is stored in the the MetaFieldset of an object as a list of field names in data_source_fields.

Whenever the object is updated using the data source APIs, such as during an ingest, the fields on the the TranslatableFieldset and NonTranslatableFieldset that are listed in data_source_fields are also updated in the latest version of the object.

If a field name is added to the data_source_fields list during an update, the current value of that field is replaced with the value in the equivalent upstream fieldset. data_source_fields can be updated during a normal object update using either the regular or data ingest APIs. Fields that are not listed in data_source_fields can be modified to contain a different value, and will not be overwritten when the upstream version of the object is updated.

The last_data_ingest field in the MetaFieldset of the object is maintained automatically when the data source APIs are used.

When changing the data_source_id of an object, the values for data_source_fields will be sourced from the existing object before the data_source_id is changed.

Data Sources For Non-Versioned Objects

A number of Skylark objects do not support full versioning functionality but may still originate in an external third-party system, such a Schedules or Customers. These objects have data_source_id and last_data_ingest fields but do not have versioning or data source APIs. These fields can, however, be used with normal filters to coordinate a data ingest process.