Increasing Size Of Data Size

1023 Words3 Pages

Over the last so many years, a large amount of data has become available like a large amount of collections of photos, genetic information, and network traffic statistics, modern technologies and cheap storage facilities have made it possible to collect huge datasets. But can we effectively use all this data? The ever increasing sizes of the datasets make it imperative to design new algorithms capable of shifting through this data with extreme efficiency.

Figure 1.1
The challenges include capture, storage, search, sharing, transfer, analysis, and visualization. The trend to larger data sets is due to the additional data derivable from analysis of a single large set of related information. As of 2012, limits on the increasing size of data sets that are feasible to process in an efficient amount of time were on the order of terabytes of data. Scientists are regularly trying to find out limitations due to large data sets in many areas, including meteorology, genomics, complex physics simulations, and biological and environmental research. The limitations also affect Internet search, finance and business informatics. Data sets grow in size in part because they are increasingly being collected by universal information-sensing mobile devices, aerial sensory technologies (remote sensing), software records, cameras, microphones, radio-frequency identification experts, and wireless sensor networks. The world's technological per-capita capacity to store data has roughly doubled every 40 months since the 1970s; as of 2010, every day 2.5 terabytes (2.5×1018) of data is being created. The challenge for large enterprises is to determine who should own big information initiatives that stride the entire organization.
Big data i...

... middle of paper ...

...r low dimensional version of the problem. This algorithm provides large speedups with only very less loss in accuracy. During recent years, several researchers have shown that indeed in many cases approximation enables reduction of the dependence on dimension from exponential to polynomial. In addition, there are many approximate nearest-neighbor algorithms that are more efficient than the exact ones, even though their query time and/or space usage is still exponential in the dimension.

The approximate nearest neighbor algorithm can be best implemented using data structures like basic knn, Kd trees and locality sensitive hashing.

1.5 Approaches followed
This report includes the three approximate nearest neighbor algorithms, the study, implementation and the results and applications related to these three algorithms:
 Basic Knn Algorithm
 KD Tree Algorithm
 LSH

More about Increasing Size Of Data Size

Open Document