What are the types of Data Sync?
Choose how your data is synchronized during extraction.
What are Data Sync Types?
In Nekt, data sync types determine how data is retrieved from your source during extraction and synchronized with the Catalog. Whether you need to fetch only updated records, the entire current state, or track changes through logs, Nekt provides flexible options to meet your data integration needs.
Types of Data Sync
Incremental Sync
With Incremental Sync, only new or updated data is fetched each time an extraction occurs.
- Best for: Workflows with frequent updates or large volumes of data, where efficiency is a priority.
- Considerations: Deleted data from the source will remain in the lake until a Full Sync occurs.
Tip: Schedule periodic full syncs to ensure data integrity when using Incremental Sync.
Full Sync
With Full Sync, the entire current state of the data source is fetched during each extraction.
- Best for: Scenarios where the Catalog must always reflect the exact state of the source, including records that have been deleted.
- Considerations: This method requires more time and resources compared to Incremental Sync, especially for large datasets.
CDC Log-Based Sync
The CDC (Change Data Capture) Log-Based Sync tracks changes in the source data by reading logs to capture updates, deletions, and inserts in near real-time.
- Best for: Scenarios where high-frequency data updates and deletions need to be captured efficiently without repeatedly scanning the entire dataset.
- Key Benefits:
- Near real-time updates.
- Efficient handling of large datasets with minimal resource usage.
- Maintains a detailed history of changes for auditing and tracking purposes.
Note: Ensure your data source supports CDC and has the necessary logging enabled for this sync type.
Additional Full Sync
If you choose Incremental Sync, you can schedule periodic Full Syncs to ensure the Catalog remains fully aligned with the source, including any deleted records.
Why Schedule Additional Full Syncs?
- Advantages: Keeps the Catalog up-to-date with the source while leveraging the efficiency of Incremental Sync for most operations.
- Challenges: Full Syncs require more time and resources. For small datasets, this might not be an issue, but for larger datasets, it can become costly.
Recommendations:
- Frequency Matters: Choose a frequency that balances resource usage with the importance of having a fully synchronized Catalog.
- Evaluate Your Needs: If your workflow doesn’t depend on immediate updates for deleted data, a less frequent Full Sync might suffice.
Choosing the Right Sync Type
When selecting a sync type for your data source, consider the following:
- Incremental Sync: Prioritize for large, frequently updated datasets where efficiency is key.
- Full Sync: Use when maintaining an exact reflection of the source, including deletions, is critical.
- CDC Log-Based Sync: Opt for real-time or near real-time updates and precise tracking of changes.
- Additional Full Sync: Combine Incremental Sync with periodic Full Syncs for a balanced approach.
By selecting the appropriate sync strategy, you can optimize both performance and data integrity, ensuring your Catalog meets your organization’s needs.