📹 Watch the Townhall Recording where experts explain the problem and dive-deeper into the dataset and baseline.
Mosquitoes, small yet perilous insects, are responsible for transmitting diseases that pose a serious threat to humans and the environment. With over 3600 known species, a few of them have the ability to transmit various pathogens, leading to widespread illnesses such as Zika, Dengue, and Chikungunya. Controlling mosquito populations is vital to prevent disease outbreaks and protect communities worldwide.
In collaboration with the Mosquito Alert citizen science project, we present the Mosquito Identification Challenge—an opportunity to impact public health initiatives directly. Traditional mosquito surveillance methods are expensive and time-consuming, but community-based approaches empower citizens to report and collect mosquito specimens. By leveraging machine learning and deep learning techniques, we aim to automate the labour-intensive image validation process, making mosquito identification more efficient and accurate.
As a participant, you will work with a real-world dataset of mosquito images, gaining hands-on experience in handling authentic data. Accurately identifying mosquito species contributes to early intervention and effective disease management. Join us in this challenge to combat mosquito-borne diseases and enhance public health through the power of AI.
📑 The Task
The competition centred around utilising advanced computer vision techniques to detect and classify small objects. In this challenge, participants will develop cutting-edge AI solutions that precisely identify mosquitoes within images captured by citizen contributors using their mobile devices.
A diverse dataset will be provided to participants, featuring real images contributed by citizens. These images will showcase mosquitoes in various contexts, encompassing different body positions, sizes, and lighting conditions. It's important to note that the dataset exhibits an unbalanced distribution of mosquito classes, posing an additional hurdle for participants to overcome. Participants must construct robust models that handle this disparity, ensuring accurate detection and classification across all categories.
The challenge entails the development of state-of-the-art computer vision algorithms that can locate mosquitoes with pinpoint accuracy, despite their diminutive size, within the given images. Moreover, participants are tasked with classifying the detected mosquitoes into predefined categories, enabling effective mosquito surveillance and analysis.
During the competition, participants will have access to a labelled dataset for model training, focusing specifically on mosquito detection and classification. The evaluation phase will assess the performance of participants' models using a separate dataset, measuring their capabilities in terms of detection accuracy, classification precision, recall, and F1 score.
The dataset for this challenge is derived from a citizen science project focused on mosquito identification. It comprises 10,700 real-world images of mosquitos captured by participants using mobile phones. These images offer a diverse representation of mosquitos in various scenarios and locations. Each image is labelled with bounding box coordinates and mosquito class information.
The dataset has been split into training and testing sets, with 80% of the images allocated for training (8,025 images) and 20% for testing (2,675 images). This division provides participants with ample training data to develop their models and a separate evaluation set to assess the performance and generalisation of their AI algorithms.
A CSV file is provided with the dataset: "train.csv" for training dataset. The file contains accurate annotations provided by expert entomologists, indicating the location and class of mosquitoes in each image. A file with annotation corresponding to the testing subset, called "test.csv" is going to be relased in second phase of the challenge.
The CSV file includes important information such as the image file name, width, and height. The bounding box coordinates, represented by four columns ("bbx_xtl", "bbx_ytl", "bbx_xbr", "bbx_ybr"), define the rectangular region where the mosquito is located. The top-left coordinates represent the starting point, while the bottom-right coordinates indicate the ending point of the bounding box.
It's worth noting that most images contain a single mosquito with its corresponding bounding box and class label. However, there are rare cases where multiple mosquitoes can be present in one image, reflecting real-world scenarios. For consistency and compatibility reasons, the convention has been to assign a bounding box and class label to only one mosquito per image, even if multiple mosquitoes are visible.
The dataset consists of six distinct classes, including two species and three genus classes, as well as a class for a species complex. Here is a summary of the classes and their descriptions:
- Aedes aegypti - Species
- Aedes albopictus - Species
- Anopheles - Genus
- Culex - Genus (Species classification is challenging, so it is given at the genus level)
- Culiseta - Genus
- Aedes japonicus/Aedes koreicus - Species complex (Difficult to differentiate between the two species)
The provided CSV file includes a "class_label" column that specifies the class name corresponding to each mosquito image.
|Class name csv
|Species of this genus are very difficult to distinguish, so classification is given at genus level
|In some cases experts can identify an individual as belonging to one of these two species without being able to clearly differentiate among them because the two species are very similar
The dataset is imbalanced, with more images belonging to certain classes, particularly Aedes albopictus. This is due to the focus of the dataset on Aedes albopictus in its early stages. It is important to consider this class imbalance when developing and evaluating models.
The distribution of classes in the dataset's training and testing subsets is provided in Table 2. Careful attention should be given to this class distribution during model development and evaluation to ensure balanced performance across all classes.
|Class name\dataset subset
📥 External Data Usage
📚 Phase 1 and Phase 2 Differences
- Test set - Phase 1 and Phase 2 will have different test sets, the annotations of Phase 1 test set will be released after the end of Phase 1.
- - Phase 1 uses a predicitions submisison format, where participants directly upload the predictions.
- - Phase 2 will be code submissions, participants will have to upload their inference code and which will run directly on AIcrowd servers and scores will be updated live.
The final prizes will be decided based on Phase 2 submissions only
The primary objective of the model is classification. However, it is important that the object detection component also satisfy a minimum threshold of 0.75 IoU.
- For classification, we use Macro F1 Score
- For object detection, we use Mean IoU (intersection over union)
Every image that has an IoU lower than 0.75 will have their classification prediction replaced with a dummy class, the primary metric used in the classification task is Filtered Macro F1 score, which the Macro F1 score on the test set after the dummy class replacements are done.
✍️ Submission Format
- Phase 1 - Each submission is required to provide the bounding box coordinates of the top left and bottom right corners. Each image will have a single bounding box only. Each submission should also contain the class name as a string. Check
sample_submission_phase1_v2.csvfrom the Resources tab.
- Phase 2 - We will provide a code submissions starter kit with submission instructions at the start of Phase 2.
- Start Date: 20th June, 2023
- Phase 1 End Date: 20th August, 2023
- Phase 2 End Date: 20th October, 2023
- Winner Announcement: 10th November, 2023
The prizes for the challenge are as follows:
- 🥇 First place : AMLD Travel Grant & Conference Access + Apple Macbook Pro (M2)
- 🥈 Second place : AMLD Travel Grant & Conference Acces
- 🥉 Third place : AMLD Travel Grant & Conference Access
The challenge prize includes a Travel Grant of 2500 CHF per team and conference access to the Applied Machine Learning Days (AMLD), a global platform for AI & Machine Learning. The winning teams will have the opportunity to attend the conference in person, with travel and accommodation expenses covered.
AMLD Conference 2024
AMLD 2024 is a four-day conference with over 2000 attendees, focused on the application of machine learning and its impact on science and society. It provides a platform for exploring the real-life applications of AI and Machine Learning technologies. The conference brings together experts and professionals from various fields to discuss the latest advancements and insights in the industry. Attendees will have the opportunity to gain valuable knowledge and network with industry leaders in the field of AI and Machine Learning.
Please note that if the winning team decides not to accept the travel grant or chooses not to attend the AMLD 2024 conference, the travel grant will be forfeited and passed on to the next team in the leaderboard. Please note that virtual attendance at AMLD 2024 is currently not available. Furthermore, the first-place team will have the option to retain the Apple Macbook Pro prize irrespective of their decision regarding the travel grant.
Have queries, feedback or looking for teammates, drop a message on AIcrowd Community. Please use email@example.com for all communication to reach the Mosquito Alert team. Don't forget to hop onto the Discord channel to collaborate with fellow participants & connect directly with the organisers. Share your thoughts, spark collaborations and get your queries addressed promptly.