A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. DataFrames are one of the most common data structures used in modern data analytics because they are a flexible and intuitive way of storing and working with data.
This article covers the following topics:
A dense dataframe is basically a matrix in which the first column represents the row-identifier/timestamp and the remaining columns represent the items and their values. T
rowIdentifier/timestamp Item1 Item2 ... ItemN
timestamp | Bread | Jam | Butter | Books | Pencil |
---|---|---|---|---|---|
1 | 3 | 1 | 2 | 0 | 0 |
2 | 7 | 2 | 0 | 10 | 20 |
3 | 0 | 0 | 3 | 0 | 0 |
4 | 4 | 0 | 0 | 0 | 0 |
In the above dataframe (or table), the first transaction (or row) provides the information that a customer has purchased the 3 packets of Bread, 1 bottle of Jam, 3 packets of Butter at the timestamp of 1. The second transaction provides the information that a customer has purchased 7 packets of Bread, 2 bottles of Jam, 10 Books and 20 Pencils. Similar arguments can be made for the remaining transactions in the above dataframe.
A sparse dataframe is basically a (non-sparse) matrix in which the first column represents the row-identifier/timestamp, the second column represents the item, and the third column represents the value of the corresponding item.
rowIdentifier/timestamp Item1 Value
A sparse dataframe generated from the customer purchase database is as follows:
timestamp | Item | Value |
---|---|---|
1 | Bread | 3 |
1 | Jam | 1 |
1 | Butter | 2 |
2 | Bread | 7 |
2 | Jam | 2 |
… | … | … |
Dense dataframe to transactional database
Sparse dataframe to transactional database
Dense dataframe to temporal database
Sparse dataframe to temporal database
Dense dataframe to utility database
Sparse dataframe to utility database
Dense dataframe to fuzzy database
Sparse dataframe to fuzzy database
Dense dataframe to uncertain database
Sparse dataframe to uncertain database
Dense dataframe to sequence database
Sparse dataframe to sequence database
Dense dataframe to geo-referenced database
Sparse dataframe to geo-referenced database
Dense dataframe to multiple timeseries
Sparse dataframe to multiple timeseries