In recent years, data streams have become an increasingly important area of research. Common data mining tasks associated with data streams include classification and clustering. Due to both the size and the dynamic nature of data streams, it is often difficult to obtain real-time stream data without the overhead of setting up an infrastructure that will generate data with specific properties. We have built the framework in R, a popular tool for data mining and statistical analysis with the intent that researchers will be able to easily integrate our framework into their existing work. In this paper we introduce the implementation of stream, an R package that provides an intuitive interface for experimenting on data streams and their applications. stream is a general purpose tool that can model data streams and perform data mining tasks on the generated data. It allows the researcher to control specific behaviours of the streams so that they create scenarios that may not be easily reproducible in the real-world, such as the merging and splitting of clusters. Additionally, it has the ability to replay the requested data for other data mining tasks if needed, or read data streams from other sources and incorporate them into the framework.
data stream, data mining, clustering, classification
John Forrest, Stream: A Framework For Data Stream Modeling in R, Distinction Paper, Computer Science and Engineering, SMU, 2011.