We recently discussed how we monitor and collect transactional credit card information with AWS Redshift and many of the same methodologies apply far saving/searching raw email receipt information.

Many companies such as Slice Technologies offer a consumer product such as order tracking or discount alerts, which requires users to signup through an email account. These companies then sell anonymized data to brands and investors which assist in tracking consumer purchase behaviors and trends. This purchase data is online only, but the data allows investors to view cart level information, such as Nike shoes purchased at Footlocker, which is not available in credit card transactional data. This also allows for much for in-depth tracking of customer behavior, because how there is an additional degree of freedom with access to cart level purchase information.

Companies will usually give marketers or investors access to data through flat files, which may need to be cleaned depending on the data provider and use case. An example cart-level line item is shown below:

...
retailer_name, transaction_date, item_name, item_amount, purchase_id, consumer_id, total_purchase_amount
...

This data can then be loaded into Redshift or another data store and queried as we showed in the transactional credit card information article.

Update: 11/30/2016

Amazon just today released what looks like an awesome tool called Athena which allows a user to define a schema and make direct queries using S3 data:

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

We are living in a golden age for data intelligence and this knowledge for the most part has not entered the investment world, which creates really exciting opportunities for early adopters of these tools.