Ongoing Projects

For a list of completed projects, please visit here.

RGC/GRF 15203120: “Securing Models and Data for Machine Learning at the Edge” (2021-2023, HK$845,055)

Abstract: Edge computing refers to the gathering, analyzing, and acting on data at the periphery of the network, and as close to data sources as possible. This is achieved by deploying edge computing devices (or simply, edge devices) which can triage and even process raw data wherever and whenever they are generated. Gartner Inc. estimated that by 2021, more than 25 billion edge devices will be connected to the Internet, and by 2022 more than 50% of enterprise data will be created and processed outside the cloud or data center, and most of them will come from edge devices.

As machine learning (ML) has been widely adopted in computer vision, human-computer interaction, and IoT, many edge devices are smart. Equipped with capable GPU/CPUs, they can complete ML tasks such as object detection in video surveillance, voice control in smart home, and intelligent sensing in industry IoT. Compared with cloud-based ML solutions, machine learning at the edge reduces network traffic, lowers the response time, and prevents a single point of failure.

However, while cloud servers are hosted in physically secured bunkers and protected by firewalls, edge devices are more vulnerable to the hostile environment due to limited hardware/software resources. In this project, we will investigate two common threats to ML models and data, namely, model extraction and sample poisoning. To resolve them under different security assumptions, we will propose a variety of protective schemes based on modern cryptographic tools such as differential privacy, homomorphic encryption, ORAM, and aggregate message authentication code.

ITF-PRP, PRP/051/19FX “vMPOS: Virtual Mobile POS using Smartphone and Near Field Communication” (2020-2023, HK$ 3,927,250)

Abstract: Thanks to the booming of mobile industry, smartphones and fintech, mobile payment has witnessed tremendous penetration in the past decade. However, there is a barrier for individual businesses and SMEs to adopt contactless card or NFC (e.g., Apple/Google/Samsung Pay) as the payment method because they need to purchase or rent a dedicated card reader and payment processor, which is the key component in a mobile
Point of Sale (mPOS) system. In this project, we will build a virtual mobile POS system (vMPOS) on a smartphone instead of on a physical device. Our main rationale is that almost all modern smartphones are equipped with NFC modules, which can be used to read data from contactless devices, and secure computing hardware, such as a Secure Element (SE) or a more general purpose Trusted Execution Environment (TEE). With our expertise in secure data processing and industrial experience in mobile payment system development, we believe vMPOS will successfully help thousands of individual businesses and SMEs to embrace the technical advance in mobile computing and fintech.

RGC/GRF 15218919: “Auditing Machine Learning as a Service” (2020-2022, HK$ 731,089)

Abstract: Thanks to the booming of cloud computing, Machine Learning as a Service (MLaaS) provides an inexpensive solution to two scarce resources in machine learning — high computational power and
experienced data scientists. Users of MLaaS provide training samples in their business domains, and the MLaaS provider decides a suitable ML model and learning algorithm, executes the training, and provides inference service to clients for unknown samples. Despite her young age, MLaaS has already gained immense popularity in various domains such as natural language processing (e.g., Google Cloud Translation API), computer vision (e.g., Microsoft Azure Face API), and speech recognition (e.g., Amazon Lex). All leading public cloud vendors offer MLaaS, such as Amazon ML, Microsoft Azure ML Studio, Google Prediction API, and IBM Watson ML. Geared up by the unprecedently fast-growing data volume in Storage-as-a-Service (SaaS) and Database-as-a-Service (DBaaS), MLaaS industry is forecast to have a 49 percent annual growth rate from 2017 to 2023.

However, as much as users cannot fully trust SaaS and DBaaS, MLaaS cannot be fully trusted either. In essence, as MLaaS is built upon the former two, it inherits their integrity issues caused by resource exhaustion, service outages, media failure, hack attacks, or even corporate dishonesty. In practice, adversarial machine learning literatures already demonstrate various attacks such as model extraction, training data poisoning, and adversarial example (model evasion) attacks. Unfortunately, traditional auditing and integrity assurance techniques such as Merkel Hash Tree no longer work in MLaaS due to two unique challenges of the latter. First, unlike those standard I/O operations in SaaS, and SQL queries in DBaaS, machine learning models are highly complex and their training and inference results are usually uncertain. Second, in MLaaS there is no guarantee of the integrity of training samples, which may come from distant data sources such as IP cams and smart speakers through insecure networks and untrusted delegates.

In this project, we propose mechanisms to audit MLaaS throughout the entire machine learning cycle. We focus on the integrity of MLaaS (against training fraud), training samples (against sample poisoning), and clients who request MLaaS inference service (against model extraction).

RGC/GRF 15222118 “Integrity Assurance for Vehicular Telematics Data” (2019-2021, HK$ 693,000)

Abstract: According to U.S. Department of Transportation, “vehicular telematics” refers to the technology that combines telecommunications and informatics to send, receive, and store information related to vehicles. With the increasing role of computers in automobiles and the prevalence of electric vehicles, telematics data are gaining rapid attention recently from the automobile industry, transportation services, and environmental protection. For example, according to U.S. Executive Order 13693, by 2017 all U.S. agencies’ new fleet vehicles are required to collect the maximum telematics data such as fuel consumption, emissions, maintenance, utilization, and speed for sustainability and greenhouse gas emission reduction.

The three primary use cases of telematics data are location-based services (such as turn-by-turn navigation), vehicular safety (such as driving statistics collection by car insurance company), and intelligent transportation (such as fleet management and ride sharing). As these telematics data become more mission critical and monetarily valuable, the assurance of integrity of these data against unauthorized modification and forging is essential. Unfortunately, due to the diversity in this ecosystem and vulnerabilities in both hardware and software, vehicular telematics data might not be fully trusted. For example, recently security firm Trend Micro reported a fundamental security flaw in the CAN protocol that enables hackers to spoof new data frames to any part of a car. These forged data can be used by the car owners to gain illegal profits, such as getting a lower premium for car issuance by lowering the recorded driving speed, and committing fraud in ridesharing applications by completing phantom trips without passengers.

Fortunately, system-on-chip manufacturers nowadays have made hardware security freely and ubiquitously available through the Trusted Execution Environment (TEE) specification. In this project we will seize this opportunity to assure vehicular telematics integrity against malicious subjects, with the provision of TEE and/or peer vehicles. To put our solutions into practical use, we plan to design and develop a system called “integrity-assured telematics for vehicles” (iATV), where vehicles are enforced to report their true telematics data for the mission-critical tasks.