CGW’18 Workshop: call for papers

The CGW’18 workshop will take place in Krakow, from 22 to 24 October 2018 and is organised by ACC Cyfronet AGHAGH University of Science and Technology and the AGH Department of Computer Science.

The CGW’18 workshop will address the following topics:

  • e-Science and collaborative applications
  • models, methods and tools for collaborative applications development
  • virtual laboratories and problem solving environments
  • big data analysis and machine learning
  • distributed data management
  • software engineering aspects with industrial and social implications

And more. The call of papers is open until 30 September 2018.

See all the details of the workshop.

 

Call for abstracts: International Symposium on Grids and Clouds (ISGC 2019)

The International Symposium on Grids and Clouds (ISGC) 2019 will take place from 31 March to 5 April 2019 at Academia Sinica, in Taipei, Taiwan.

This year, the event’s theme is “Efficient and Safe Processing of FAIR Open Data”. The goal of ISGC 2019 is to create a face-to-face venue where individual communities and national representatives can present and share their contributions to the concepts of Open Data and Open Science.

The call for papers is now open!

Important dates and information:

On-line Submission
• Submission Deadline: Monday, 5 November 2018
• Abstract Word Limit: 400 (minimum)~600 (maximum) words
• Acceptance Notification to Authors: Wednesday, 12 December 2018

Find out more about the event.

Presentation on European Helix Nebula Science cloud

As part of the Helix Nebula initiative the European Helix Nebula Science cloud project, for the last two years has done a Pre-Commercial Procurement for the establishment of an European hybrid cloud platform to support the deployment of high-performance computing and big-data capabilities for scientific research. From the beginning of this project, SURFsara has been one of the ten European institutes acting as a buyers group to make selections between the four vendor consortia throughout the project based on technical and economical reviews. The project has gone through a design and prototype type phase, both reducing the number of eligible consortia to the two now left in the current pilot phase.

As part of the pilot phase representatives of the two remaining consortia, T-Systems and RHEA, will go to the buyers group institutes for demos and presentations of their offerings. On September the 11th they will be at SURFsara for an afternoon of presentations ending with drinks.

In addition to the presentations of the consortia and oneData, there will be a presentation from three of the SURFsara test cases from the project: WeNMR (with reservation), LOFAR and from Nikhef Stoomboot scale out. The discussion will end with a session about what the Helix Nebula Science cloud might mean for current or future SURF services.

Registration is open (access pin: 0303)

Program (September 11th, VK1/2 SURFsara Amsterdam):

  • 14.00 – 14.10 Welcome and introduction to HNSciCloud
  • 14.10 – 14.30 Overview of OneData
  • 14.30 – 15.00 Overview of pilot platforms RHEA
  • 15.00 – 15.30 Overview of pilot platforms T-Systems
  • 15.30 – 16.00 Coffee
  • 16.00 – 17.15 Use case presentations and discussion

 

The 10th edition of the School on Efficient Scientific Computing (ESC)

The tenth edition of the School on Efficient Scientific Computing (ESC School) will be held from 22 to 27 October 2018, at the University Residential Centre of Bertinoro (Ce.U.B.), in Bertinoro, Italy.

The aim of the ESC school is to offer the participants the opportunity to improve their computing competences, learning from qualified and experienced scientists how to best exploit modern hardware and software technologies in their daily scientific work. The program proposes introductory lectures on trends in hardware architectures and parallel programming, with more in-depth lessons on modern C++, effective memory usage, floating-point computation and programming in a heterogeneous world combining multi-threading, GPUs and clusters.

The school is organised as a small class of about 30 students, alternating lectures, hands-on sessions and self-managed time slots for the best learning experience, with lecturers available during the whole week for insights and discussions.

Applications are now open, please see more information on the event’s website.

Ansible Style Guide in Action

This article was written by Bruce Becker, Senior Operations Officer at the EGI Foundation, and re-posted here with permission. You can also read it on Bruce’s personal blog.

 

A few weeks ago, I announced a style guide for developing Ansible roles.

The intended audience is the developers of middleware components and the aim of the guide is to improve or ability to collaborate, and to deliver products smoothly and reliably, without breaking the infrastructure in general.

A typical case would be an existing product which performs some specific function e.g., a storage management front-end service. Another case would be the one I want to use as an example here – the so-called “worker-node” function.

Case: Worker Node

The worker node is essentially a composition of clients which interact with infrastructure components:

  • validate user token
  • get data
  • submit workload request to your local resource manager
  • check how that’s going
  • send accounting data

etc. If we were starting out now, these functions might well be built as serverless endpoints, but as with all things infrastructure-related, one has to deal with the legacy of what came before.

The worker node function has typically been distributed as a meta-package in OS repositories – an RPM or DEB which expresses all of the necessary dependencies. A site wishing to provide the worker node function could therefore easily ensure that this was present by simply installing the metapackage. That is, if the prerequisite state is assured.

That’s a big if and a big ask in 2018.

Building the worker node now

If we had the case of a totally new site in the federation wishing to participate by offering compute resources, we would probably want this site to be integrated and functional with a little demand on the site itself. If we start from this position, we might well consider the site resources and the layer of middleware necessary to federate it as separated by a well-defined contract : We (federation) give you a bunch of endpoints to send data to, you (resource provider) send the data. We’ll go one step further and provide you with the functions which send that data, so that you have zero interference with your setup.

This separation of function from platform is why containers were developed.

Modelling services

How would things be different, if we approached this from a 12 factor point of view? We have to deal not only with the installation of binaries and other files, but also the configuration of these, around site-specific setups, procedures and policies. The last thing a product team wants to deliver to an endpoint which will eventually use it, is a product which doesn’t play nicely with the rest of the environment. This could include, for example, hard-coding certain paths, asserting the presence of particular users or, usage of the network in a specific way. All of these would be examples of “bad behaviour”, since the integration of the site into the federation is not done according to a central prescription, but according to an OLA agreed to by both parties.

We therefore need to deliver not only products, but also strategies for deploying those products, which are flexible enough to respect local site policies. If we are to be fluid, we also need a high degree of trust that the final result will not only perform as advertised (i.e., works, and does what it needs to do), but also won’t break local setups. The Ansible Style Guide describes aspects of developing, testing, documenting and delivering the role. It is more about how than what, because the overriding, big-picture goal is to solve problems and have them stay solved.

The way to do this is, as with most engineering problems, to factor out the big problem into smaller ones in some logical way. Doing things in this way, we come to have a sort of “dependency tree” of roles, so that infrastructure engineers can separate problems and solve them permanently.

This has the happy consequence however that end users (typically, site administrators) can re-use these products with confidence at their site, know where to go for support and understand how to contribute back. Looking at the simple case of building a User Interface, shown in Figure 1, this is quite easy to understand. We can even link the roles themselves to various actions and outputs as shown in Figure 2.

 

Figure 2: Representation of the expression of the Ansible role and the resulting product – container images in EGI’s Quay Organisation. The vertical axis describes the dependency graph of the Ansible roles for UMD products, while the horizontal axis shows how these are expressed in various environments by applying them. The final products (container images in this case) are immediately re-usable.

In this way, we can continue modelling individual roles and map events in source code to artifacts in production. The final touches to our modelling flow are added in Figure 3, where we add the links to the respective GitHub repositories and the all-important testing phase – more on that in a later section.

Figure 3: Schematic diagram of the full continuous integration and delivery of UMD configurations, as well as dependency tree respective Ansible roles, for the simple case of the User Interface. In this case, we deliver pre-built and Docker images to the Quay registry. Testing is done with TestInfra, a python-based infrastructure spec tool.

Action

Now that we have a clear idea of how to go about modelling our roles, and putting the tools in place for our continuous integration and delivery pipeline, we can take a closer look at using the EGI Ansible Style Guide to get started.

Getting started

The first thing you need to do is get the style guide, and use it to create a new Ansible role. Ansible roles are usually generated with the Ansible Galaxy CLI command init, but this uses a role skeleton which doesn’t cover many of EGI’s bases. We therefore use the egi-galaxy-template in the Style Guide repo to generate a better one:

git clone https://github.com/EGI-Foundation/ansible-style-guide
ansible-galaxy init --role-skeleton=ansible-style-guide/egi-galaxy-template ansible-role-wn`

We now have a shiny new Ansible role : ansible-role-wn. Before we go about implementing it, we need to have a means for implementing tests and generating test scenarios. Typically we use Molecule for this, which is great for generating a full set of test scenarios and strategies.

Install Molecule with pip, and generate a scenario, using a virtualenv^[VEnv]:

$ virtualenv style
$ source style/bin/activate
(style)$ pip install molecule
(style)$ molecule init scenario -r ansible-role-wn

Initial Commit

At this point we have an empty (but stylish) role in a clean environment and a default testing scenario.

Running the test strategy should result in all of it passing2:

molecule lint
molecule dependency
molecule create
molecule converge
molecule verify

This means absolutely nothing, of course – we need to start adding some failing tests !

Tests and Development of Roles

The EGI UMD follows something similar to an Acceptance Test Driven Development pattern.

There are several products, each of which are testing independently upstream by their owners, and candidates for inclusion in the distribution are then communicated to the release co-ordination team. This team then checks whether the UMD Quality Criteria are respected by the product, and whether the new version breaks anything already in production. There are several strategies for doing this, and the one which makes the most sense varies from product to product. Then of course, there is the expected functionality of the product as it would be in production. Finally, there is the consideration that we expect these roles to be deployed into production, which means that the configurations should be hardened and secure by design. Deploying faulty configurations into production environments – even with fully-patched software – can lead to serious degradation in operational security.

We therefore need to implement tests for each of these, as far as we can.

Test-Driven Development3, from Extreme Programming4 suggests that engineering proceed on a “Red, Green, Refactor” cadence.

RED

Considering we are developing the functionality of a worker node here, the first thing we could check for is that the relevant packages are actually present. Using TestInfra’s package module, we can write this assertion.

def test_packages(host, pkg):
        assert host.package(pkg).is_installed

Seems simple, right? All we need to do is pass the correct fixtures to the function test_packages, to see whether the host we will provision with molecule is in the desired state.

It is important to remember what we are testing for here. We are not testing whether the Ansible playbook has run correctly – or even whether an Ansible playbook has run at all – we are simply making assertions about the host. These assertions should be true no matter how the host arrived at its current state, and of course should reflect the desired state in production environments.

We therefore need to consult the source of truth5 for the worker node package requirements – the same repository that the product team is maintaining which the UMD team has tested and done the QC tests on – to write the fixtures for this test.

We can still converge the role with no problems (nothing has been implemented yet), but when it comes to running the tests (molecule verify), we will be duly informed that they are all FAILING

Great success. Go ahead and add that test to the scenario:

git add molecule/default/test_packages.py
git commit -m "Added failing test for packages"
git push

Note: using the EGI Ansible Style Guide, there is a .travis.yml already set up for you if you want to do CI on Travis. All you need to do is enable the repository and Travis will take care of the rest.

GREEN

The next step in TDD is to implement just enough code to make that test pass. With Ansible, this is amost too easy:

First, create a variable in defaults/main.yml to hold the packages that need to be present, taking into account differences across operating systems and OS releases:

---
# defaults/main.yml
packages:
  redhat:
    '6':
      - wn_pkg_1
      - wn_pkg_2
    '7':
      - WN_1
      - WN_2
  debian:
    jessie:
      - worker_node
    stretch:
      - worker_node

Next, add a task which ensures that those packages are present:

---
# tasks/main.yml
- name: Ensure worker node packages are present
  package:
    name: "{{ item }}"
    state: present
  loop: "{{ packages[ansible_os_family|lower][ansible_os_distribution_major] }}"

Here, we take advantage of the facts gathered by Ansible identifying the host OS and version – which of course is why we crafted the variable packages in the way we did.

Of course, these tasks need to be applied in an actual playbook. Molecule creates the simplest possible playbook for the scenario for you:

# molecule/default/playbook.yml
---
- name: Converge
  hosts: all
  roles:
  - role: ansible-role-wn

This playbook is used during the converge stage. If there are any dependencies which are required (which are now clear from our dependency tree!), they can be added before the application of the role you are working on :

# molecule/default/playbook.yml
---
- name: Converge
  hosts: all
  roles:
    - {role: EGI-Foundation.umd, release: 4, tags: "UMD" }
    - {role: EGI-Foundation.voms-client, tags: "VOMS" }
    - {role: ansible-role-wn, tags: "wn"}

Once we have implemented the functionality, we repeat the converge and verify until the tests are passing.

REFACTOR, REPEAT

Figure 4: Schematic representation of a Test-Driven Development of an Ansible role.

Once the tests are passing, we take another look over our code and tests and try to ascertain whether the tests are really doing what we want them to do and whether that part of the role has been implemented in the best possible way. Figure 4 shows a general workflow of how this should be done.

Conclusions

Clearly, we are not done with the development of the worker node role. However we can be sure that application of this role to any production site will not break the site – a very important point! We now have a solid base from which to step to the next iteration, adding tests for desired behaviour and functionality to achieve it as we go. We also have the means to express this role in arbitrary environments – be they bare metal, hypervisor virtualisation, or Linux containers – all from a single well-maintained role.

As discusssed above The worker node needs to be able to perform many functions – we should try to implement tests for as many of these functions as we can. Similarly, as many of the EGI Quality Criteria should be included in our test coverage, so that we can ensure sites that by applying these roles off-the-shelf, they will be increasing the stability of their site and decreasing their day-to-day operations load.

Furthermore, by using a common style guide for developing these roles, we make it easier to get started for others who want to contribute. The style guide helps peers and collaborators do code review when features or development is proposed via pull request, and gives clear guidelines for how these contributions should be recognised.

All in all, this is a small step towards improving the stability of sites in the EGI federation, without compromising agility and quality, and reducing the friction in the middleware delivery pipeline.

References and Footnotes
  1. “Developers of middleware components” is an EGI-federation-specific way of thinking of this audience. What I have in mind is maintainers or product owners who want their products to live in the EOSC ecosystem. Even products which may live at the boundary of this ecosystem may be relevant. ↩
  2. This is long-hand for molecule test, which will execute the full testing strategy. ↩
  3. A good overview of test-driven development was written buy Martin Fowler ↩
  4. Beck, K., & Andres, C. (2015). Extreme programming explained: Second edition, embrace change. Boston: Addison-Wesley. ↩
  5. In this case, it’s the  WN Metapackage repository ↩

France Grilles Operation Workshop

A quick summary by Jerome Pansanel, who attended the event on behalf of France Grilles.

The France Grilles Operation Workshop is an annual event gathering many actors (site admins, users and France Grilles partners) working with grid and cloud computing technologies in France.

The 2018 edition of the workshop took place in Montpellier from 27 to 29 June and was hosted by LUPM. The conferences covered wide-ranging topics listed below:

  • Interoperability of Cloud, HPC and HTC infrastructures
  • Last updates about the iRODS storage facility (FG-iRODS)
  • Prospective analysis on Cloud computing (Cloud federation, security, application portal, EGI federated cloud, containers, edge computing.

Baptiste Grenier, Senior Operations Officer at the EGI Foundation, gave two talks – one on the XDC project and another one on the EGI Federated Cloud.

The event’s presentations are available online.

The CORBEL Medical Infrastructure Users Forum: Paris, 15 October

The CORBEL Medical Infrastructure Users Forum will take place on 15 October in Paris.

The forum is meant to promote collaboration between research communities, funding bodies and medical research infrastructures. This year the forum will focus on emerging technologies in the context of big data and personalised/stratified medicine.

The conference will be an opportunity to raise awareness of medical RIs and exchange with scientific communities.

The October 2018 MIUF meeting is intended to discuss:

  • the structuring of medical research communities at the pan-European level
  • the emerging needs of medical research projects in the context of the big data and
    personalised / stratified medicine approach
  • and the challenges raised in terms of development and deployment of data services

Registration is open.

The European HTCondor Workshop: 4-7 September

The European HTCondor Workshop will take place this year in the United Kingdom, from 4 to 7 September 2018, hosted by Rutherford Appleton Laboratory (RAL) in Oxfordshire with help from the STFC Scientific Computing Department and GridPP UK project.

The workshop will be an excellent occasion for learning from the sources (the developers!) about HTCondor, exchanging with your colleagues about experiences and plans, and providing your feedback to the experts. Participation is open to anyone interested in HTCondor.

A reduced early-bird registration fee will apply until 31 July.

See all the details of the workshop.

The European Big Data Value Forum 2018: call for workshops

The European Big Data Value Forum (EBDVF) is a key European event for industry professionals, business developers, researchers, and policy makers to discuss the challenges and opportunities of the European data economy and data-driven innovation in Europe.

As part of its 2018 Presidency of the Council of the European Union, Austria will host the second European Big Data Value Forum, from 12 to 14 of November. Keynotes and presentations will range from cutting-edge industrial applications of Big Data technologies, artificial intelligence, innovative business cases of the data economy, inspiring future visions, and insights on EU policy-making and R&D&I funding in this area.

A call for workshops for EBDVF 2018 Day 3 (November 14) is now open until 12 July.

The collaborative workshops will open up BDVA and BDV PPP activities to all stakeholders and citizens interested in contributing to the European Big Data Value Ecosystem. Over 12 workshops and networking activities are expected to be organised during this day.
If you are interested in organising a workshop at EBDVF 2018, please submit your draft proposal by 12 July.

See all the details of the event.

DevSecOps: Including Security in the Continuous Delivery Pipeline

This article was written by Bruce Becker, Senior Operations Officer at the EGI Foundation, and re-posted here with permission. You can read the original article on Medium

 

I have been thinking about how to include a vulnerability scan in a pipeline to deliver applications to EGI FedCloud sites. It goes a little like this.

The big picture

A CSIRT’s job is never done! A distributed computing platform is inherently open to risks, even more so a collaborative platform. Ensuring that the platform is safe and secure for users is a thankless, full-time job. The attack surface can be very large indeed when one considers a platform like FedCloud — there are several layers to it which may provide vectors for exploits. While the majority of these can be locked down by the operators of these sites, at the end of the day, they are still used by … well, users.

Whose cloud is it anyway ?

There is the usual thin line to tread between usability, ease of access, and security. Much of the appeal of a thing like FedCloud is the freedom of users — and the communities which they belong to — to define their own applications and workload scenarios, molding the basic infrastructure into something they are comfortable with. In essence, by providing a common IaaS or PaaS layer, FedCloud allows users to deploy arbitrary applications, at their own speed, under their own control. Of course, with great freedom comes great responsibility.

There is an inherent difference between users and operators : the former are trying to optimise their usage of an infrastructure, while the latter are trying to optimise the stability thereof. It’s not that either of these player is malicious per se, but their different priorities generate a natural conflict — one which perhaps cannot entirely be removed, but which can be mediated.

Something to talk about

How could subtle changes to the environment improve the relationship between operators and users ? Perhaps the first positive step is to surface issues before they become a problem. The second is to provide a common language for announcing vulnerabilities and clear, easy-to-execute instructions for mitigating them when detected. With Dev (the users), Sec (the CSIRT) and Ops (the infrastructure) all on the same page, any conflicts can be discussed in an objective manner.

Prevention is better than cure

Currently, EGI CSIRT does a great job of scanning the endpoints of the infrastructure to detect vulnerabilities. This is necessary, since these vulnerabilities are not a static set, but new ones are being found continuously. Fixing machines that are already deployed is a necessity in order to provide a secure platform, but what about the new applications that are built and deployed by user communities ? Wouldn’t it be nice if these applications could be checked before they were deployed1? In a perfect world, there would be a more-or-less well-defined pipeline through which applications could flow, before landing in the production environment.

Typically, this would include all the great things that you can imagine in a pipeline — continuous integration, testing, and delivery. It would be awesome if we could surface the vulnerabilities or security risks at the same time as surfacing run-time or deploy-time issues. Put differently, we wouldn’t deploy a broken application, so why deploy an insecure one ?

How to spot a vulnerability

That of course begs the question: How do we know that an application is insecure ?

A naive answer appears to be “Just scan the damn things for known vulnerabilities”. Indeed there are several tools out there for doing this, including Pakiticlair and others. There are also tools for delivering “compliance as code” — particularly InSpecTestInfra. These tools typically compare installed packages against a vulnerability database. They are designed to check OS packages. They may also work with some language ecosystems like Ruby2and Node3– maybe not so much with Python or Go — but that relies on specific packages which can be matched against a vulnerability database. These scanners do not actually do penetration testing, as is usually the case in network scanning systems, or more advanced penetration testing systems.

And herein lies the catch : What if your applications are not delivered with packages. This is indeed the case with, e.g. CODE-RADE, where everything is built from scratch and no “packages” are installed. We could, for example, tag builds according to the versions of the source code built, and then match thoseagainst the CVE databases, perhaps.

Although this is a design feature of CODE-RADE, it may be a mere convenience for many use cases. Users may simply hack their application into shape until it works, then tag a VM or container and call it a day. Detecting vulnerabilities introduced in such applications is going to be a risky business unless a true penetration testing suite can be introduced to the delivery pipeline. There are some tools, again mostly language specific, which can be called to the cause of keeping applications in the production infrastructure safe, e.g. OWASP Dependency Check4.

Adding Sec to the DevOps pipeline

Let’s face it, we’re not running a Fortune500 company here. You can’t have 100% secure applications, but we can do a damn sight better than what we’ve got right now ! I propose a shift from vulnerability monitoring of the infrastructure to vulnerability testing of the applications before they even get there.

If development of research applications follows a continuous integration, adding compliance and vulnerability testing to the pipeline represents just another step for the application to pass. Sure, there is a conceptual leap to make: from “Trust me, Ops ! This opaque blob of data is totally benign!” to “Ok Ops, you’ve got your tests, I’ve got mine and they’re all passing” is perhaps a big one for many. In order to have people adopt this way, Ops needs to deliver a smoother experience and better support than they are currently delivering to users : builds for arbitrary applications, arbitrary environments and configurations and what Devs love about 12 Factor Apps : Dev/Prod Parity5.

Something we can all trust

Once we have a pipeline, we can and should raise security or vulnerability issues wherever we can, along with all the infrastructure tests. Furthermore, these tests should be separated from the application tests themselves. In other words, if Ops provides a tested and immutable environment for Dev to build on, then the application should:

  • Ensure that it can build — Are the compilers and dependencies available ? Are the relevant infrastructure services available ?
  • Ensure that it is correct — Have errors been introduced into the environment in recent commits ? Does the application maintain internal consistency in the build and execution environment provided by Ops ?
  • Ensure that it will run — Does the execution environment permit the proper execution of the application, with access to relevant backing6services ?
  • Ensure that infrastructure remains immutable — Has the application made detectable changes to our infrastructure ?

That last point is key. The rest of the tests (integration tests, unit or functional tests) are up to Dev. But trusting Dev to ensure that Prod is ok is like trusting the fox with the chickens — there’s an inherent conflict of interest, even if there is, as is most often the case, no malice. No, these tests need to be maintained by Prod, in collaboration with Sec.

Asserting Compliance

We then come full-circle. EGI has an extensive list of security policies which can be used as a basis for writing compliance as code. They need to get out of whatever format they’re in now, and into something that can be executed. To quote the Chef pitch for Inspec:

  • Transform your requirements into versioned, executable, human-readable code.
  • Detect Fleet-wide Issues and Prioritize Their Remediation
  • Reduce Ambiguity and Miscommunication Around Rules
  • Keep up with Rapidly Changing Threat and Compliance Landscapes