We organise monthly seminars in areas related to Data Science. To receive notifications about future events, please subscribe to our events mailing list.
Title: ChatGPT for the Language of Life
Speaker: Dr Petr Simecek
Date and time: Wednesday 19 April 2023, 12:00 noon
Venue: Zoom
Abstract: The realm of natural language processing (NLP) has witnessed a revolution with the advent of massive language models, such as GPT3.5, OPT, and BLOOM. Recently, similar neural network architectures have been adapted to genomics and proteomics, paving the way for advancements in these domains. In this presentation, we will discuss existing DNA and protein language models, namely DNABert, ProtBertBFD, and ESM2, and illustrate how they can be tuned to specific objectives. Furthermore, we will elucidate how the model's embeddings encapsulate both evolutionary and functional information, highlighting their significance. To conclude, we will demonstrate this methodology by addressing the problem of detecting a topological knot on the protein backbone. Precisely, we will classify proteins to be knotted or not based solely on their sequence.
Title: Quantum communications projects at UM
Speaker: Prof. Andre Xuereb
Date and time: Wednesday 15 March 2023, 12:00 noon
Venue: Zoom
Abstract: The security of our digital systems is founded on assumptions that are now known to be flimsy. We will soon have to overhaul all our cryptographic methods and even introduce entirely new technologies, based on quantum mechanics, to solve this problem. The University of Malta is engaged in a number of EU-funded projects that aim to develop the basis of future quantum-secured communication networks, deploy a network in Malta, and even prepare for quantum satellites. In this talk I will explain the foundational principles behind the so-called quantum threat, how quantum mechanics (and some fancy mathematics) may have the solution, and briefly touch upon four projects that we are involved in. Most of all, I want to discuss fully funded opportunities we have for postgraduate positions and studies. We want you to work with us!
Title: Vogt, Bailey and BOB: an exploration of local connectivity in the brain
Speaker: Dr Claude Bajada
Date and time: Wednesday 22 February 2023, 12:00 noon
Venue: Zoom
Abstract: My talk will discuss the work conducted at the University of Malta's “Boundaries of the Brain Lab” lab that aims to understand local connectivity in the brain using Magnetic Resonance Imaging (MRI) techniques.
Our group has developed software, the Vogt-Bailey toolbox, which utilises spectral graph theory to objectively measure the degree of homogeneity in cortical neighbourhoods and address the criticisms by Percival Bailey and Gerhardt von Bonin about the limitations of traditional cortical parcellation championed by Oskar and Cecile Vogt (ie. splitting the brain into distinct regions).
I will discuss the inspiration behind the project, discuss the advancements in the field on brain connectivity and the impact of this work on our understanding of the brain.
Title: Data Storage on the Middle-Earth cluster
Speaker: Prof. Johann Briffa
Date and time: Wednesday 7 December 2022, 12:00 noon
Venue: Zoom
Abstract: In this second seminar in the series on submitting jobs on the Middle Earth cluster, we consider the issue of data storage. We start with a few details on the different levels of storage available on the cluster, how this impacts our processing pipeline, and the effect of storage choice on speed of execution as well as storage efficiency and scalability. We then look at methodologies to determine what should be stored where, how to express this in a job submission script. A number of common use cases are considered, and the best practice approach detailed for each case.
Title: Pushing the limits of RL in linear environments
Speaker: Dr Leander Grech
Date and time: Wednesday 12 October 2022, 12:00 noon
Venue: Zoom
Abstract: The LHC at CERN requires beam-based feedback systems to ensure correct operation. Traditionally, a Proportional-Integral (PI) controller together with a linear model of the beam-based system is used to apply corrections to the superconducting magnets in order to control specific beam and machine parameters, e.g. tune and orbit. Previous work developed a simulation environment for the Tune Feedback (QFB) system and it was shown that an RL agent can outperform a PI controller in cases where classical control algorithms generally fail. In this work, the ideas from this simulation were extended to create an environment called random environment (RE), which represents feedback systems of the same type with the additional functionality that the input and output dimensions can be set arbitrarily and the actions can be either continuous or discrete. The limitations of state-of-the-art RL algorithms are assessed with different configurations of RE, showing for example that popular deep reinforcement learning algorithms such as proximal policy optimization (PPO) perform poorly and unreliably on higher dimensional tasks. Non-parametric methods, which offer some theoretical guarantees, were used to assess the interplay between exploitation and exploration, and also to shed some light on the best practices to follow when training beam-based controller systems.
Title: Efficient use of the Middle-Earth cluster
Speaker: Prof. Johann A. Briffa
Date and time: Wednesday 18 May 2022, 12:00 noon
Venue: Zoom
Abstract:
In this seminar we go beyond the basics of submitting a job with the Slurm scheduler, considering various aspects to ensure our jobs run to completion as quickly as possible. We start with a few details on how the scheduler works, how this impacts our job specifications, and how our jobs affect others. We look at methodologies to determine what resources are jobs really need, so that we request the minimum necessary resources. We also consider what other jobs are currently running or in queue, and how to find out what resources are immediately available. Finally, we also look at ways to debug problems with our job submissions.
Title: Disaggregation and Placement of In-Network Programs
Speaker: Dr Nik Sultana
Date and time: Wednesday 23 February 2022, 12:00 noon
Venue: Zoom
Abstract: Programmable network switches and NICs are enabling the execution of increasingly rich computations inside the network using languages like P4. Today's in-network programming approach maps a whole P4 program to a single target, limiting a P4 program's performance and functionality to what a single target device can offer. Disaggregating a single P4 program into subprograms that execute across different targets can improve performance, utilization, and cost. But doing this manually is tedious, error-prone and must be repeated as topologies or hardware resources change.
This talk describes Flightplan: a target-agnostic, programming toolchain that helps with
splitting a P4 program into a set of cooperating P4 programs and maps them to run as a
distributed system formed of several, possibly heterogeneous targets.
The talk will cover both systems' and programming language aspects of this research. We will look at evaluation results from testbed experiments and simulation. During the talk I will also describe how Flightplan's design addresses practical concerns, including the provision of a distributed diagnostics interface and the mitigation of partial failures.
Code, documentation, tests, a demo, and videos can be obtained from flightplan.cis.upenn.edu
Title: Docker: How to easily run an entire software stack locally and ease software
distribution
Speaker: Dr Noel Farrugia
Date and time: Wednesday 19 December 2021, 13:00
Venue: Zoom
Abstract:
Most of us researchers have no doubt dealt with either libraries or software that requires a number of steps and dependencies to be installed before they can be used. This process is rarely as easy as the user manuals make it to be. Docker can be the solution to this both as a consumer and producer of software. Docker gives you the ability to ensure that users of your software are running an identical setup to that specified by you in the docker file easing the barrier of entry to the use of such software.
In this seminar we will discuss what are docker containers, what is the difference between containers and virtual machines, how to create your own docker container and more.
Title: The VBIndex Toolbox: studying correlations in human brain function using fMRI data
Speaker: Dr Christine Farrugia
Date and time: Wednesday 15 December 2021, 12:00 noon
Venue: Zoom
Abstract: The development of Magnetic Resonance Imaging (MRI) has been instrumental to our understanding of the function and structure of the human brain. In this talk, we start by taking a look at the main features of this imaging technique and the associated pre- and post-processing pipelines. We then move on to the VBToolbox, a functional MRI (fMRI) analysis software package developed by the Boundaries of the Brain (BOB) group at the University of Malta. This package makes use of principles from spectral graph theory to detect correlations in the function of the human brain, both locally (at each voxel) and also across the whole brain. The data series collected during an MRI scan session are each associated with a volume element (voxel) within the brain, and the main idea is to consider each voxel as the node of a graph whose edges are weighted by the degree of similarity between the different series. We shall review some of the results obtained with the VBToolbox and discuss their implications, and end the talk by looking at current developments and future work.
Title: The Deep-FIR Project: Super-Resolution in the Wild
Speaker: Mr Matthew Aquilina
Date and time: Wednesday 10 November 2021, 12:00 noon
Venue: Zoom
Abstract: Super-resolution (SR) involves enlarging and enhancing low-resolution images using computational techniques to accurately predict the unseen details of an image (‘Zoom and Enhance’). The influx of deep learning has pushed SR performance well beyond what was previously thought possible with mathematical/statistical techniques alone. With today’s convolutional neural networks (CNNs), we can super-resolve artificially degraded (blurred, downsized, etc.) images into incredibly detailed and realistic results, or near-exact replicas of their original high-resolution source. However, SR is far from being a solved problem, as applying those same networks on real in-the-wild images (taken by a CCTV camera, smartphone, etc.), results in significantly worse quality images, sometimes no better than the standard digital zoom available on our smartphones. In-the-wild images pass through a large (and typically unknown) number of degrading operations (lens blurring, image compression, noise influx, etc.) before they are eventually saved onto our devices. Even the largest neural networks are unable to decipher and reverse such a complex web of degradations without guidance. In the Deep-FIR project, our aim is to introduce new methods for equipping neural networks with the tools to identify and combat various degradations from any input image or video.
Our first step in this direction has been to introduce meta-attention, a lightweight mechanism which allows users to exploit image metadata (attributes) to configure any SR network to identify and counteract various degradations.
This seminar will delve into the details of meta-attention, as well as highlight and discuss the various research fronts the Deep-FIR project is investigating across both image and video SR.
Title: LigityScore: Convolutional Neural Network for Binding-affinity Predictions
Speaker: Mr Joseph Azzopardi
Date and time: Wednesday 23 June, 12:00 noon
Venue: Zoom
Abstract: Scoring functions are at the heart of structure-based drug design and are used to estimate the binding of ligands to a target. Seeking a scoring function that can accurately predict the binding affinity is key for successful virtual screening methods. Deep learning approaches have recently seen a rise in popularity as a means to improve the scoring function having as a key advantage the automatic extraction of features and the creation of a complex representation without feature engineering and expert knowledge. In this seminar we will present LigityScore1D and LigityScore3D, which are rotationally invariant scoring functions based on convolutional neural networks. LigityScore descriptors are extracted directly from the structural and interacting properties of the protein-ligand complex which are input to a CNN for automatic feature extraction and binding affinity prediction. This representation uses the spatial distribution of Pharmacophoric Interaction Points, derived from interaction features from the protein-ligand complex based on pharmacophoric features conformant to specific family types and distance thresholds. The data representation component and the CNN architecture, together, constitute the LigityScore scoring function. The main contribution of this study is to present a novel protein-ligand representation for use as a CNN-based SF for binding affinity prediction. LigityScore models are evaluated for scoring power on the latest two CASF benchmarks. The Pearson Correlation Coefficient and the standard deviation in linear regression were used to compare and rank LigityScore with the benchmark model, and to other models recently published in literature. LigityScore3D has achieved better overall results and showed similar performance in both CASF benchmarks. LigityScore3D ranked 5th place for the CASF-2013 benchmark, and 8th for CASF-2016, with an average R-score performance of 0.713 and 0.725 respectively. LigityScore1D ranked 8th place for the CASF-2013 and 7th place for CASF-2016 with an R-score performance of 0.635 and 0.741 respectively. Our methods show relatively good performance when compared to the Pafnucy model (one of the best performing CNN-based scoring functions), on the CASF-2013 benchmark using a less computationally complex model that can be trained 16 times faster.
Title: Machine Learning for Particle Accelerators
Speaker: Dr Ing Gianluca Valentino
Date and time: Wednesday 3 March, 12:00 noon
Venue: Zoom
Abstract: Although machine learning techniques have been applied to particle accelerators since the late 1980s, a renaissance has only been seen in recent years. This is due, in part, to the success of modern developments such as deep learning and, in part, is a result of the
sophistication and data-intensiveness of current machines. The system dynamics of particle
accelerators tend to involve large parameter spaces which evolve over multiple time scales,
and interrelations between accelerator subsystems may be complex and nonlinear.
As a result, there is growing interest from the particle accelerator community to use machine learning techniques to analyze large quantities of archived data to accurately model accelerator systems, detect anomalous machine behavior, and perform active tuning and control. It is expected that machine learning will become an increasingly valuable tool to meet new demands for beam energy, brightness, reliability, and stability. This seminar will review the ongoing research activities in this area, as well as the contribution of the University of Malta in collaboration with the particle accelerator community.
Title: Stereo Vision from Earth Observation
Speaker: Dr Mang Chen
Date and time: Wednesday 17 February, 12:00 noon
Venue: Zoom
Abstract: The number of Earth observation satellites has increased drastically over the past decade, where some of these satellites enable the capture of two (or more) images of the same region at quasi real-time. Stereo vision techniques can then be used to automatically
compute Digital Elevation Models (DEMs) that are important for a number of domains including hydrology, urban planning and natural hazard detection. These stereoscopically derived DEMs provide an efficient and low-cost means for remote mapping of surface topography over large areas and at multiple times for change detection.
The Centre National d’Etudes Spatiales (CNES) have developed the Stereo Pipeline for
Pushbroom Images (S2P) framework that combines the information obtained from the
Satellite together with a stereo matching process to estimate the DEM. However, the stereo matching process adopted by this framework is based on classical techniques. The aim of the SAtellite TraIning and NETworking (SATINET) project is to adopt deep-learning based techniques to improve the stereo-matching process. WorldView-3 satellite images at a resolution 30cm and airborne Lidar data covering the area of San Fernando in Argentina was adopted in our evaluation. Compared with the inherent shortages of classical techniques in feature extraction for textureless, repeated pattern and occlusion, deep learning methods automatically learn and calculate feature parameters through training. Compared with the 66.85% completeness of the classical techniques (SGBM), our method reaches 74.05%, which is a 7% gain. The results below further show that our approach is more robust when compared to the state-of-the-art method.
Title: Can we connect Vision and Language using Graphs?
Speaker: Mr Brandon Birmingham
Date and time: Wednesday 16 December, 12:00 noon
Venue: Zoom
Abstract: A long-standing goal of Artificial Intelligence is to have agents capable of understanding and interpreting the visual world using natural language. The advancements in computing power and the sheer amount of visual and linguistic data available today helps in getting closer to this quest. Research at the intersection of Computer Vision and Natural Language Processing is currently booming and the automatic generation of image captions has recently gained a lot of popularity. Several ideas and architectures have been proposed to machine-generate human-like sentences that describe images, but all are short of reaching human-level quality. The focus of this talk is to specifically explore how the graph data structure can be used to connect the vision and language modalities in the context of image caption generation and how such graph-based models compare with the current state-of-the-art deep learning-based models.
Title: Enhancing machine learning with synthetic biometrics and documents
Speaker: Dr Norman Poh
Date and time: Wednesday 11 November, 12:00 noon
Venue: Zoom
Abstract: Developing machine learning models requires a lot of data. The more data there is, the lower the risk of overfitting. Traditionally, we have to collect real data. While this is laborious, annotating data is even more so. We resolved this by creating tools to annotate data automatically, leaving only refined annotation to be done by humans (using CVAT). Next, we explored options to generate synthetic data. By mixing real and synthetic data, we have solved a number of computer vision problems effectively, from object detection to classification and semantic segmentation. In this talk, I will share with you our journey in biometrics and document verification.
Title: A gentle introduction to quantum computing
Speaker: Dr John Abela
Date and time: Wednesday 28 October 2020, 12:00 noon
Venue: Zoom
Abstract: Quantum computing is an area of study that focuses on the development of computing devices that are based on the principles of quantum mechanics. Quantum mechanics is a theory that attempts to explain the nature and behavior of energy and matter on the microscopic (atomic and subatomic) level. Quantum computers use a combination of Qubits (Quantum bits), and the quantum phenomena superposition and entanglement, to perform specific computational tasks. All this at a much higher efficiency than their classical counterparts. Quantum computers are not super-Turing powerful but they provide an exponential speed-up for certain NP-Hard problems and for specific use cases. Development of quantum computers is progressing at a fast pace with billions of dollars being poured into research and development. Quantum computing has important implications for areas such a cryptography and will completely revolutionize drug discovery and design. In the talk we will introduce the basic ideas of quantum mechanics and then give a brief overview of how quantum computers work.
Title: Data warehousing and analytics
Speaker: Andrew Sammut
Date and time: 19 February, 12:00 noon
Venue: Communications Lab, Faculty of ICT (Room 1, Level 0, Block B)
Abstract: Building a data warehouse and maintaining it is not an easy feat. In order to design a proper data warehouse, one has to understand the data underlying the structure, as well as the question or problem that needs to be answered. When one fails to do so, this can be catastrophic for a business that is heavily dependent on its data. In his presentation Andrew Sammut will be discussing the best practices for designing data warehouses on the cloud, as well as providing solutions for some common challenges. He will demonstrate how DAX measures can be used in PowerBI to analyse and create models using the underlying data. Further to this, he will be demonstrating how AI models can be deployed on the cloud stack to be used for modelling purposes. This includes the comparison of different AI models used for time series forecasting including ARIMA, neural networks and decision trees.
Title: Mummies, tomography, and segmentation: The ASEMI Project
Speaker: Marc Tanti
Date and time: 15 January 2020, 13:00
Venue: Communications Lab, Faculty of ICT (Room 1, Level 0, Block B)
Abstract: Ancient Egypt is known for mummifying pharaohs but did you know that they also
mummified animals? In order to investigate this practice and the reasons behind it,
archeologists at the ESRF use x-ray tomography in order to produce 3D scans of what is
inside these mummies without destroying them. This results in a greyscale volume showing
just the different densities of materials inside but it would be more useful to be able to
recognise and highlight the different objects such as bones, textiles, and biological tissues, a process that takes months of manual work to do. The ASEMI project is a research-based
project to use computer vision techniques to automatically segment these 3D volumes into
different materials which should cut down the time required to analyse these mummies
from a few months to a few days. This talk will go through the basics of animal mummies,
tomography, and segmentation in an accessible way.
Title: A Heuristic Solution for the Selective Dial-a-Ride Problem
Speaker: Mark Cauchi
Date and time: 11 December 2019, 12:00 noon
Venue: Communications Lab, Faculty of ICT (Room 1, Level 0, Block B)
Abstract: Demand responsive transportation can relieve road congestion and pollution, offering reliable transportation at a cheap price. Such a service can be implemented as a mobile application, and requires three stakeholders, namely, the service provider, passengers, and drivers. An algorithm is required to assign passengers to drivers, while delivering some optimal solution based on quality of service and/or profits. This presentation addresses a variant of this problem referred to as Selective Dial-A-Ride Problem (DARP). Solving large instances of this problem may be infeasible in reasonable computational time, and therefore, a metaheuristic solution is often adopted. The Variable Neighbourhood Search (VNS) has emerged as the most prominent modern solver to such problems, with such advantages as control over the global and local searches of the algorithm. In this presentation, a VNS is shown with two algorithmic novelties. Statistical analysis of test results based on a local scenario, yield invaluable information to the service provider which they may share with application users, such as, the expected profitability to the driver and the chances of being served to the passenger. Such information is indispensable for the successful adoption of this application.
Title: Enhancing Satellite Imagery with Oriented Filters and Machine Learning
Speaker: David T. Lloyd
Date and time: 30 October 2019, 12:00 noon
Venue: Communications Lab, Faculty of ICT (Room 1, Level 0, Block B)
Abstract: There are currently over 600 Earth-observing satellites in orbit and that number is set to grow in the years to come. Together these satellites produce many 100’s of TB of data each day, with a significant fraction of this data made available for free to users. Naturally such large datasets are ripe for the application of newly developed tools originating in data science, notably those utilising deep learning. In this talk I will present some preliminary results from the SAT-FIRE project. I will describe a simple, computationally-efficient filter we have developed for improving the quality and fidelity of satellite images acquired with poor signal to noise ratio. Further, I will show that computational super-resolution techniques based upon neural networks are able to enhance the resolution (and information content) of thermal images of the sea around the Maltese islands. Such enhanced images are of use for improving the accuracy of oceanographic models, with applications in climate modeling and directing sea rescue efforts.
Title: Machine Learning in Computer-Aided Drug Discovery (Workshop)
Speaker: Dr Jean-Paul Ebejer
Date and time: 24 May 2019, 12:00 noon
Venue: Networks Lab, Level -1 block B, Room 7, Faculty of ICT
Abstract: Computer-Aided Drug Design (CADD) plays an increasingly critical role in the drug-discovery process. CADD involves the application of computer algorithms to improve pharmaceutical productivity. These include algorithms for the identification of the biological target involved in a disease, toxicity and side-effect prediction, and searching a database for molecules which exhibit a therapeutic effect against a particular protein of interest. The latter is known as Virtual Screening. In this workshop I will give an overview of CADD with particular emphasis on virtual screening. We will develop a machine learning (ML) model to discriminate between actives and decoys against a protein target which plays a critical role in the life-cycle of HIV. This interdisciplinary talk is aimed at an audience with prior Python programming experience and an interest in the application of ML models in life sciences.