My name is Steven Newhouse. I lead the Technical Services Cluster at EMBL’s
European Bioinformatics Institute where I have responsibility for the IT infrastructure
that we operate. EBI is one of the world’s global hubs for
life science data. Our main activity is to collect data from
all over the world from wherever it’s generated, to store it, to process it and then to extract
knowledge out of that data, and then redistribute both the knowledge and that raw data back
to the community that contributed to it. Our big data infrastructure is distributed
across three data centres where we store over 150 petabytes of data. It’s important to remember that one petabyte
of data is equivalent to 20 million four drawer filing cabinets filled with paper. So this is a very large amount of data that
we store on behalf of the world’s global community. Our infrastructure and our collaborations
and the science we want to do are continuing to grow. So to do that we have to carry on scaling
our infrastructure. We are always looking to see what our partners
in academia are doing and what is happening in the commercial world with organisations
such as Google, Amazon, Netflix and so forth. One of the big changes over the last few years
is the adoption of what are called cloud-native applications. This is the ability to build applications
that scale well on cloud infrastructures and are able to operate across big data. So this is one of the big focuses of EBI’s
strategy over the coming years: to move to this cloud native world. Our stored data is increasing at around 40%
each year and, with that, the amount of analysis we need to do also increases. So we’re investing in our data centre capacity,
our networking, to connect our data centres, and to also connect our data to cloud providers. This will enable us to increase our analysis
capacity and help undertake the work that we need to do to develop our science. Here at EMBL-EBI, we’re building these cloud-native
applications to run on big data infrastructure. What we do here will provide a best practices
model that can be adopted elsewhere in the life sciences community and show how a hybrid
cloud approach can build on our own internal big data infrastructure and that provided
by external cloud providers to really scale out the amount of data analysis that we are
able to do.

In Focus: Big data infrastructure
Tagged on:                                 

Leave a Reply

Your email address will not be published. Required fields are marked *