Working with big data is going to become one of the priorities for large Russian businesses and organizations very soon, and the issue recently drew about 400 attendees to the Big Data 2012 Forum, Russia's first large-scale event on the namesake topic.

In recent years, organizations have been facing explosive data growth, said Sergey Matsotskiy, chairman of IBS, a large Russian systems integrator, who spoke at the event. IDC forecasts that from 2011 to 2015 global data will increase in volume by 340 percent, while total traffic volume will rise almost threefold and mobile data traffic share will increase from 3 percent to 10 percent. New sources of huge data volumes are emerging, such as CRM, RFID, mobile devices and satellite navigation systems. Meanwhile, the majority of organizations lack both technical capabilities for working with big data and skills for managing it, according to Gartner analysts. Given the traditional thoroughness of higher technical education in Russia, this opens new career opportunities in the global IT market for local specialists, said Matsotskiy.

Held March 22, the Big Data Forum 2012 was organized by Open Systems Publications, publisher of Computerworld Russia and CIO Magazine Russia. Most of the attendees were from businesses that collect and process large volumes of structured and unstructured data, i.e. banks, telcos and local IT vendors.

Industry observers point to the following signs indicating that an organization is facing an issue with big data processing: the data volumes exceed physical scale-up capabilities of the organization's IT infrastructure; there is a perceived need for rapidly processing large data volumes; a wide variety of data formats or of methods for interpreting and analyzing the data is used; and the costs of data storage and processing are growing at a fast pace. In all those cases, organizations are forced to invest in new technologies for storing and processing the data.

"The term big data refers to a new generation of technologies and architectures designed for efficient extraction of useful insights from large volumes of heterogeneous data," stated Gunther Thiel, business development line manager at NetApp EMEA.

"For the first time in the history of the IT industry, a fundamental shift is under way in the notion of information itself: It is now taking the shape of social online environments, multimedia, clickstreams, sensor data, images, email messages and so on," stressed Hartmut Wagner, vice president of information management at HP EMEA.

According to a survey done by the organizers of the forum, the vast majority of Russian companies have not been facing the issue of big data (or at least, do not perceive it as serious). Nevertheless, some businesses recognize that they will have to confront the issue very soon.

Vyacheslav Arkharov, application platform business development manager at Microsoft Russia, listed the following examples of real-world tasks that may require implementing big data technologies: risk assessment, prevention of money laundering, trend analysis and forecasting in the financial segment; inquiry examination; Web and social network analysis, advertising intelligence and digital image analysis in the mass media and online content sectors; customer behavior analysis and sales intelligence both in online and traditional commerce; fraud prevention in online games; various national security tasks; gene and pharmaceutical research; as well as scientific and educational research.

Further use scenarios for big data technology include assessing the impact of weather and road traffic conditions on cargo delivery and fuel consumption; examining conversation records in call centers for customer behavior analysis; operation and fault analysis in telecom networks; probing the impact of weather changes on energy generation; interpreting smart meter data in electric grids; and analyzing system transaction logs in various verticals, said Sergey Likharev, head of information management solutions at IBM Eastern Europe and Asia.

Solutions for working with big data should be able to provide easy access to the entire body of corporate information; process both structured and unstructured data; map relations between various pieces of data regardless of their format; work with original data sources to prevent duplicating; understand meaning and context of all data; identify similar phone calls, emails, documents and IM messages; as well as process and analyze data on the fly using predefined rules, according to HP's Wagner.

When addressing the big data issue, it is critical to evaluate total costs of collecting, storing and processing the data, and of course, the priority is to increase ROI for the corresponding technologies, pointed out Nick Rossiter, regional director of Informatica Russia and CIS. This could be done through raising the value of data or lowering its cost, said Rossiter. Increased value is achieved primarily by acquiring new business capabilities and advantages (such as speeding up customer request processing, widening customer audience, taking measures to reduce customer complaints, lowering risk of fraud transactions, increasing employee efficiency, etc.). Meanwhile, among the first things to do to lower data costs is to optimize and update IT infrastructure and processes, which in turn leads to lower total IT costs.

"Is it possible to derive 10 times more value from data than we can now?" asked Luke Lonergan, co-founder and chief technology officer of Greenplum (now part of EMC). "Definitely yes, by using the data, which is usually neglected or not processed due to technical constraints."

Most often mentioned at the forum was Apache Hadoop technology, a distributed computing architecture capable of automatically replicating data to numerous nodes, as well as searching and analyzing the data across all of them. Based on Google MapReduce technology, Hadoop enables the analysis of petabytes of unstructured data distributed across a cluster not necessarily made up of high-end servers. The technology is used in many high-profile companies, including Facebook, Twitter, LinkedIn, Apple, Amazon and Yahoo. Not surprisingly, almost every company that had its executives present keynote speeches at the forum also stated its support for Hadoop in some form or other.

In general, the forum earned very high praise from its attendees despite the fact that no keynote speaker could provide an example of a finished project in the area of big data. Hopefully, the next forum will highlight not only new approaches and technologies but also their practical implementations, including in Russian businesses.