AdventureWorks is the name of Microsoft’s set of sample databases (which can be downloaded here), and kudos to Microsoft for spending the time to create something for us to play with (and not have to create ourselves).
One of the shortcomings of AdventureWorks (not complaining here, just stating a fact) is a lack of data. Especially when testing performance related matters, a few thousand records just isn’t enough. A few folks in the community have been kind enough to share some scripts to enlarge some tables in the OLTP version of AdventureWorks, like Adam Machanic’s script for the TransactionHistory and Product tables, and a script for the SalesOrderHeader and SalesOrderDetail tables written by Jonathan Kehayias. These are both great, and I’d encourage you to have a look at them.
For my general purposes as a “BI guy” however, I spend most of my time with the AdventureWorksDW sample database…and while playing around with columnstore indexes and ROLAP (more of that in a future post) I soon realized that I could use larger fact tables. Needless to say I could not find any resources on that, so I had to create something for that purpose…and in case anybody else might find it useful, I’d like to share.
The Adventure Works Fact Generator script will create a new FactInternetSales_Big table & generate about 10 million records in it.
Here are a few things you’ll need to know about the script:
- Order numbers are randomly generated, with a number between 1 and 10,000,000 to avoid duplicates as much as possible. To make these generated orders easier to identify, the order number is prefixed with “GSO”.
- Sales order line numbers are simply taken from their originating records, which means that you could get an order with missing or out of sync line numbers. The amount of time it would have required to circumvent this just wasn’t worth it.
- Random quantities between 1 and 10 are used to influence amounts and get a reasonable distribution of different sized orders. However, some values like the freight amount was left as is.
- A random number of days (between -365 and 365) are added to the order date, due date and shipped date.
- The generated dataset is filtered to insert only the dates available in the date dimension.
- Products, product costs and currencies are not changed.
Hope you find it useful!!