Overview and aims
The digital revolution has made data easy to capture digitally and inexpensive to store. The rate at which data is being stored is growing rapidly, with databases typically doubling in size every 20 months. For example, 2.3 trillion GB of data is created per day with IBM expecting this to increase to 43 trillion GB (40 zettabytes) a day by 2020. As a result, traditional data management and analysis techniques are no longer adequate, leading to an exponential demand (912% increase) for professionals with expertise in managing and analysing big data sets. As a result, traditional data management techniques are no longer adequate for storing and analysing this vast collection of data.
It is essential that businesses are equipped, both in terms of the business infrastructure and appropriately skilled workforce, to capture, transform and analyse this data to produce useful information, or business intelligence, that enhances the business in some tangible way (e.g. improving the value chain, enhancing customer service, solving business problems).
This module therefore introduces you to the processes, techniques and technologies businesses use to develop their infrastructure so that they are able to manage big data and transform it into useful business intelligence. A number of key challenges faced by companies exploiting big data are also raised.
The module content is designed to develop and structure your understanding according to the stages an organisation moves through to develop and manage the infrastructure necessary to derive business value from large volumes of data, and is organised as follows:
- The big picture of big data
- Lifecycle of Knowledge Discovery in Databases (KDD)
- Infrastructure opportunities and challenges of big data
- Overview of database technology and data warehouses
- What are data warehouses and why use them?
- The role of RDBMS in big data environments
- The importance of non-relational databases (e.g. key-value pair databases, document databases (MongoDB), Columna databases (Hbase) and graph and spatial databases)
- Introducing Hadoop and MapReduce
- Technology and infrastructure for managing big data
- Overview of Internet environment and protocols (TCP/IP,HTML,XML)
- Large-scale networking and communication
- Distributed and service-orientated architectures (SOA)
- Web services, middleware and vendor products
- Cloud computing
- The internet of things and the semantic web
After studying this module you should be able to:
- critically evaluate the key approaches and challenges for managing and utilising big data in organisations using current thinking in the field with respect to the role of relational and non-relational database technologies
- propose how data management technologies can be applied to manage data effectively and help solve a given business problem
- demonstrate a critical understanding of service-orientated and cloud-based technologies for enterprise/internet solutions to managing big data
- investigate the use of cloud computing and storage capabilities by new and emerging web/business intelligence techniques to create new application opportunities for service providers
This module will help you gain the skills and qualities to:
- prepare a business case for the deployment of distributed or cloud services-based solutions for managing big data for an organisation, taking into account costs, benefits, risks and relevant business goals
- identify key aspects of the implementation or provision of a data warehouse using distributed or cloud services by a third party, and make specific recommendations for the management of those aspects
- communicate work outcomes persuasively and professionally, as suited to your identified audience