How can we help you today? How can we help you today?

Database Subsetting Research

A Redgate team exploring an acquired DataBee solution are hoping to gain a more comprehensive understanding

of how customers are currently using the technique of database subsetting, possibly in conjunction with data masking or

instead of virtualization, and any difficulties that might arise when getting started with it.


What is database subsetting?

Database subsetting is creating a copy of a database that contains only a portion of the data, while still being referentially intact. For example, suppose you have a table in a database containing millions of customers, but you only want to work with a database that contains thousands. Simple enough, just copy over a fraction of the rows, right? But then suppose you also have an orders table that contains references to the customers. You’ll need to take only those orders that are related to customers in your sample. As your schema grows more complicated, so does the process of taking a valid subset of the data, and the need for automating the process becomes greater.

Why would I want a subset of my data?

For example:

  • For testing, development and training environments you often want to work with real data. Providing a copy of your entire production database would be too costly in time and disk space, especially if you need to provide multiple copies for every developer. Subsetting allows you to work with data that contains all the necessary links between tables for your programs to function, but for a fraction of the cost.

  • You may have a database that contains data for multiple organisations. To give a single organisation a copy of the whole database would expose data that they should not have access to. Instead you can provide a subset containing just the data that is relevant to their organisation.

  • You may want to discard a portion of your data based on some criteria. For example, you may want to clear out data older than a certain date, or data which you can no longer keep for GDPR compliance.

We are currently looking to get a better understanding of the demand for data subsetting. If your business has a need for any of the above or would be interested in data subsetting for any other reason, then please get in touch






LubosM
0

Comments

2 comments

  • kevriley
    Hi,  I would be interested in hearing more about this - one of my customers does exactly that to get a smaller subset of data for developers to have on their local machines, but it's a mammoth task and not smooth at all!
    kevriley
    0
  • AlexJeffreys
    Hi Kev, thanks for getting in touch! I have sent you a PM for further discussion.
    AlexJeffreys
    0

Add comment

Please sign in to leave a comment.