The CORBEL Medical Infrastructure Users Forum: Paris, 15 October

The CORBEL Medical Infrastructure Users Forum will take place on 15 October in Paris.

The forum is meant to promote collaboration between research communities, funding bodies and medical research infrastructures. This year the forum will focus on emerging technologies in the context of big data and personalised/stratified medicine.

The conference will be an opportunity to raise awareness of medical RIs and exchange with scientific communities.

The October 2018 MIUF meeting is intended to discuss:

  • the structuring of medical research communities at the pan-European level
  • the emerging needs of medical research projects in the context of the big data and
    personalised / stratified medicine approach
  • and the challenges raised in terms of development and deployment of data services

Registration is open.

EGI will participate in the CODATA-RDA Research Data Science Summer School

The CODATA-RDA Research Data Science Summer School and Research Data Science Advanced Workshops will run for their third and second year respectively at the International Centre for Theoretical Physics, Trieste.

The CODATA-RDA Research Data Science Summer School will take place on 6-17 August and will be followed by the advanced workshops on 20-24 August 2018.

The CODATA-RDA Research Data Science Summer School provides training in the foundational skills of Research Data Science. This includes the principles and practice of Open Science and research data management and curation, the use of a range of data platforms and infrastructures, large scale analysis, statistics, visualisation and modelling techniques and more.

Giuseppe La Rocca, Technical Outreach Expert at the EGI Foundation, and Gergely Sipos, Customer and Technical Outreach Manager at the EGI Foundation, will participate in the summer school and give a tutorial on how the EGI Jupyter Notebook can be used to perform data analysis in the EGI Federated Cloud.

The CODATA-RDA Research Data Science Summer School is open to participants in any research discipline and registration is free of charge.

Please see all the details of the events.

 

The European HTCondor Workshop: 4-7 September

The European HTCondor Workshop will take place this year in the United Kingdom, from 4 to 7 September 2018, hosted by Rutherford Appleton Laboratory (RAL) in Oxfordshire with help from the STFC Scientific Computing Department and GridPP UK project.

The workshop will be an excellent occasion for learning from the sources (the developers!) about HTCondor, exchanging with your colleagues about experiences and plans, and providing your feedback to the experts. Participation is open to anyone interested in HTCondor.

A reduced early-bird registration fee will apply until 31 July.

See all the details of the workshop.

The European Big Data Value Forum 2018: call for workshops

The European Big Data Value Forum (EBDVF) is a key European event for industry professionals, business developers, researchers, and policy makers to discuss the challenges and opportunities of the European data economy and data-driven innovation in Europe.

As part of its 2018 Presidency of the Council of the European Union, Austria will host the second European Big Data Value Forum, from 12 to 14 of November. Keynotes and presentations will range from cutting-edge industrial applications of Big Data technologies, artificial intelligence, innovative business cases of the data economy, inspiring future visions, and insights on EU policy-making and R&D&I funding in this area.

A call for workshops for EBDVF 2018 Day 3 (November 14) is now open until 12 July.

The collaborative workshops will open up BDVA and BDV PPP activities to all stakeholders and citizens interested in contributing to the European Big Data Value Ecosystem. Over 12 workshops and networking activities are expected to be organised during this day.
If you are interested in organising a workshop at EBDVF 2018, please submit your draft proposal by 12 July.

See all the details of the event.

EGI Newsletter: Summer Issue

We have published a new issue of the Inspired Newsletter, now available online and in PDF format.

In this issue:

Happy reading and don’t forget to subscribe!

DevSecOps: Including Security in the Continuous Delivery Pipeline

This article was written by Bruce Becker, Senior Operations Officer at the EGI Foundation, and re-posted here with permission. You can read the original article on Medium

 

I have been thinking about how to include a vulnerability scan in a pipeline to deliver applications to EGI FedCloud sites. It goes a little like this.

The big picture

A CSIRT’s job is never done! A distributed computing platform is inherently open to risks, even more so a collaborative platform. Ensuring that the platform is safe and secure for users is a thankless, full-time job. The attack surface can be very large indeed when one considers a platform like FedCloud — there are several layers to it which may provide vectors for exploits. While the majority of these can be locked down by the operators of these sites, at the end of the day, they are still used by … well, users.

Whose cloud is it anyway ?

There is the usual thin line to tread between usability, ease of access, and security. Much of the appeal of a thing like FedCloud is the freedom of users — and the communities which they belong to — to define their own applications and workload scenarios, molding the basic infrastructure into something they are comfortable with. In essence, by providing a common IaaS or PaaS layer, FedCloud allows users to deploy arbitrary applications, at their own speed, under their own control. Of course, with great freedom comes great responsibility.

There is an inherent difference between users and operators : the former are trying to optimise their usage of an infrastructure, while the latter are trying to optimise the stability thereof. It’s not that either of these player is malicious per se, but their different priorities generate a natural conflict — one which perhaps cannot entirely be removed, but which can be mediated.

Something to talk about

How could subtle changes to the environment improve the relationship between operators and users ? Perhaps the first positive step is to surface issues before they become a problem. The second is to provide a common language for announcing vulnerabilities and clear, easy-to-execute instructions for mitigating them when detected. With Dev (the users), Sec (the CSIRT) and Ops (the infrastructure) all on the same page, any conflicts can be discussed in an objective manner.

Prevention is better than cure

Currently, EGI CSIRT does a great job of scanning the endpoints of the infrastructure to detect vulnerabilities. This is necessary, since these vulnerabilities are not a static set, but new ones are being found continuously. Fixing machines that are already deployed is a necessity in order to provide a secure platform, but what about the new applications that are built and deployed by user communities ? Wouldn’t it be nice if these applications could be checked before they were deployed1? In a perfect world, there would be a more-or-less well-defined pipeline through which applications could flow, before landing in the production environment.

Typically, this would include all the great things that you can imagine in a pipeline — continuous integration, testing, and delivery. It would be awesome if we could surface the vulnerabilities or security risks at the same time as surfacing run-time or deploy-time issues. Put differently, we wouldn’t deploy a broken application, so why deploy an insecure one ?

How to spot a vulnerability

That of course begs the question: How do we know that an application is insecure ?

A naive answer appears to be “Just scan the damn things for known vulnerabilities”. Indeed there are several tools out there for doing this, including Pakiticlair and others. There are also tools for delivering “compliance as code” — particularly InSpecTestInfra. These tools typically compare installed packages against a vulnerability database. They are designed to check OS packages. They may also work with some language ecosystems like Ruby2and Node3– maybe not so much with Python or Go — but that relies on specific packages which can be matched against a vulnerability database. These scanners do not actually do penetration testing, as is usually the case in network scanning systems, or more advanced penetration testing systems.

And herein lies the catch : What if your applications are not delivered with packages. This is indeed the case with, e.g. CODE-RADE, where everything is built from scratch and no “packages” are installed. We could, for example, tag builds according to the versions of the source code built, and then match thoseagainst the CVE databases, perhaps.

Although this is a design feature of CODE-RADE, it may be a mere convenience for many use cases. Users may simply hack their application into shape until it works, then tag a VM or container and call it a day. Detecting vulnerabilities introduced in such applications is going to be a risky business unless a true penetration testing suite can be introduced to the delivery pipeline. There are some tools, again mostly language specific, which can be called to the cause of keeping applications in the production infrastructure safe, e.g. OWASP Dependency Check4.

Adding Sec to the DevOps pipeline

Let’s face it, we’re not running a Fortune500 company here. You can’t have 100% secure applications, but we can do a damn sight better than what we’ve got right now ! I propose a shift from vulnerability monitoring of the infrastructure to vulnerability testing of the applications before they even get there.

If development of research applications follows a continuous integration, adding compliance and vulnerability testing to the pipeline represents just another step for the application to pass. Sure, there is a conceptual leap to make: from “Trust me, Ops ! This opaque blob of data is totally benign!” to “Ok Ops, you’ve got your tests, I’ve got mine and they’re all passing” is perhaps a big one for many. In order to have people adopt this way, Ops needs to deliver a smoother experience and better support than they are currently delivering to users : builds for arbitrary applications, arbitrary environments and configurations and what Devs love about 12 Factor Apps : Dev/Prod Parity5.

Something we can all trust

Once we have a pipeline, we can and should raise security or vulnerability issues wherever we can, along with all the infrastructure tests. Furthermore, these tests should be separated from the application tests themselves. In other words, if Ops provides a tested and immutable environment for Dev to build on, then the application should:

  • Ensure that it can build — Are the compilers and dependencies available ? Are the relevant infrastructure services available ?
  • Ensure that it is correct — Have errors been introduced into the environment in recent commits ? Does the application maintain internal consistency in the build and execution environment provided by Ops ?
  • Ensure that it will run — Does the execution environment permit the proper execution of the application, with access to relevant backing6services ?
  • Ensure that infrastructure remains immutable — Has the application made detectable changes to our infrastructure ?

That last point is key. The rest of the tests (integration tests, unit or functional tests) are up to Dev. But trusting Dev to ensure that Prod is ok is like trusting the fox with the chickens — there’s an inherent conflict of interest, even if there is, as is most often the case, no malice. No, these tests need to be maintained by Prod, in collaboration with Sec.

Asserting Compliance

We then come full-circle. EGI has an extensive list of security policies which can be used as a basis for writing compliance as code. They need to get out of whatever format they’re in now, and into something that can be executed. To quote the Chef pitch for Inspec:

  • Transform your requirements into versioned, executable, human-readable code.
  • Detect Fleet-wide Issues and Prioritize Their Remediation
  • Reduce Ambiguity and Miscommunication Around Rules
  • Keep up with Rapidly Changing Threat and Compliance Landscapes

EGI at the 2nd edition of the EOSC Summit

The second edition of the EOSC Summit will take place on 11 June 2018, in Brussels, Belgium.

The event will follow up on achievements and progress of the European Open Science Cloud. It will allow participants to share information on relevant activities and commitments and to reflect on the steps ahead for ensuring an effective implementation of the EOSC. It will also serve as an opportunity to launch a stakeholder consultation on the draft ‘Rules of Participation of EOSC’ and on the draft ‘FAIR Data Action Plan’, two key inputs for the future EOSC governance.

Tiziana Ferrari, Yannick Legre and Sergio Andreozzi will be present at the summit on behalf of the EGI Foundation and the EOSC-hub project.

Tiziana Ferrari, as Project Coordinator of EOSC-hub, will deliver two presentations. In the session “Progress towards the EOSC: services, architecture, access, rules, data”, Tiziana will present the EOSC-project, current developments and the service catalogue. Tiziana will also give a talk during the workshop “Rules of participation”, where she will discuss the EGI Operations activities.

Have a look at the full programme and tweet using the hashtag #EOSC.

New challenges in data science: Big Data and Deep Learning on Data Clouds

The summer course New challenges in data science: Big Data and Deep Learning on Data Clouds will take place in Santander, from 18  to 22 June, in the context of the DEEP Hybrid-DataCloud and eXtreme-DataCloud projects.

The course is targeted at specialists and students of different academic levels (master, graduate students, PhD candidates, postdoctoral students and senior scientists) interested in current research trends regarding compute intensive data analytics techniques over massive amounts of data. A special special emphasis is put on deep learning, high-performance computing and hybrid cloud platforms.

The first half of the course is devoted to the study and analysis of different use cases (in Astrophysics and Particle Physics, Bioinformatics and Biodiversity). Over these sessions, an in-depth analysis and consolidation of technical requirements will be performed, with the objective of understanding the present and future challenges in these scientific areas over the next years.

The last part of the course will consist on the description of a practical deployment and implementation of the tools required to perform the aforementioned massive data processing on top of a cloud computing environment. The integration of existing HPC systems (like supercomputers) in cloud environments will also be tackled.

The discussion will be framed in the context of the European Open Science Cloud, with a focus on researchers requirements and the different computing platforms present at both the national and European level.

See all the details of the course.

 

Metabolomics data in the context of metabolic networks

Metabolomics datasets are the outcome of biochemical events ruled by enzymatic reactions. All these reactions, and related substrates and products, can be gathered in a single mathematical object called a metabolic network. This webinar will discuss how these networks are built, how they can be modeled into a mathematical formalism (graph) and how these graphs can be used to provide biochemical insight on metabolic fingerprints.

Important details:

Date and time: Friday 6 July 2018, from 10:00 to 11:00 (GMT).

How to register

EU Competitiveness Council endorsed the implementation roadmap for the European Open Science Cloud

The Implementation Roadmap for the European Open Science Cloud was endorsed by EU research ministers on 29 May in Brussels. During the event, Carlos Moedas, Commissioner for Research, Science and Innovation, put an emphasis on the following aspects for the realisation of the European Open Science Cloud:

  • The Cloud should be a wide, pan-European federation of existing and emerging excellent infrastructures, which respects the governance and funding mechanisms of its components
  • Membership in this federation would be voluntary
  • The governance structure would include member state ministries, stakeholders and scientists.

The EOSC-hub project plays a crucial role in this ambition, as it is set to create the integration and service management structure of the EOSC – the hub – where researchers and innovators can discover, access, and use a variety of advanced data-driven resources.

In the meeting, Carlos Moedas also presented the integrated advice of the Open Science Policy Platform (OSPP). Established in 2016, the platform is made of stakeholders who advise the Commission on how to further develop and implement an Open Science policy in Europe.

Overall, this political endorsement will accelerate the implementation of the EOSC. The next important events are:

  • EOSC Summit: the scientific and policy stakeholders of the EOSC will gather in Brussels on 11 June to discuss the rules of participation in the EOSC and the drafting of pan-European principles for FAIR data.
  • DI4R 2018: the third edition of DI4R will be held in Lisbon, from 9 to 11 October and will showcase the policies, best practices and services necessary for the support of research.
  • Launch of the EOSC governance structure: on 23 November, the incoming Austrian Presidency of the Council plans to gather research and innovation ministers to sign off the governance structure, which will steer the work of several projects under Horizon 2020, and to launch the first version of the EOSC Portal.