Abstract: Real-time online data processing is quickly becoming an essential tool in the analysis of social media for political trends, advertising, public health awareness programs and policy making. Traditionally, processes associated with offline analysis are productive and efficient only when the data collection is a one-time process. Currently, cutting edge research requires real-time data analysis that comes with a set of challenges, particularly the efficiency of continuous data fetching within the context of present NoSQL and relational databases. In this paper, we demonstrate a solution to effectively adsress the challenges of real-time analysis using a configurable Elasticsearch search engine. We are using a distributed database architecture, pre-build indexing and standardizing the Elasticsearch framework for large scale text mining. The results from the query engine are visulized in almost real-time.
Abstract: When conducting data analysis in the twenty-first century, social media is crucial to the analysis due to the ability to provide information on a variety of topics such as health, food, feedback on products, and many others. Presently, users utilize social media to share their daily lifestyles. For example, travel locations, exercises, and food are common subjects of social media posts. By analyzing such information collected from users, health of the general population can be gauged. This analysis can become an integral part of federal efforts to study the health of a nation's people on a large scale. In this paper, we focus on such efforts from a Canadian lens. Public health is becoming a primary concern for many governments around the world. It is believed that it is necessary to analyze the current scenario within a given population before creating any new policies. Traditionally, governments use a variety of ways to gauge the flavor for any new policy including door to door surveys, a national level census, or hospital information to decide health policies. This information is limited and sometimes takes a long time to collect and analyze sufficiently enough to aid in decision making. In this paper, our approach is to solve such problems through the advancement of natural language processing algorithms and large scale data analysis. Our in-depth results show that the proposed method provides a viable solution in less time with the same accuracy when compared to traditional methods.