Open Food Facts - Google Summer of Code (GSOC) 2022

Welcome!

Open Food Facts is a non-profit open source and open data project with a massive impact on the food and health of millions of people across the world, and we need your help to make this impact even bigger in many more parts of the world!

Open Food Facts is a participating organization for the Google Summer of Code 2022.

Getting started

If you would like to work on Open Food Facts projects during the Google Summer of Code, make sure to join the Open Food Facts community on Slack, introduce yourself there and get familiar with the project.

A good way to do that is to open your fridge or cupboard to find a food item to scan with our Android or iOS application. Open Food Facts works like Wikipedia. The food product may already be in the database with its data completed. If it’s present, you will get valuable information about its nutritional quality (Nutri-Score) or its level of food processing (NOVA). If it’s not present yet, please take some pictures and add it!

What we're looking for

Please note that there are likely to be more students applying than we will have places allocated by Google, so make sure you make your application is a good one. We are looking for:

  • Enthusiasm for your proposed project
  • Experience with the tools you will need to complete the project (eg. demonstrated ability in the relevant programming language or environment)
  • Information on how you would approach the project, what time you can put into it and what you think you will be able to achieve over the GSoC period

Make a Small Contribution

We want to make sure that new contributors are familiar with our development workflow and with the tools involved with it. Make at least one small code contribution to an Open Food Facts project (preferribly related to the project you want to work on during GSoC) to demonstrate you can build the application from the source code, make changes to it, and submit a merge request in the project's GitHub repository.

Decide on a GSoC project

Our mentors put together a list of project ideas that you can choose. Project proposals are not strictly limited to the ones listed in this project ideas page. You can contact the main developers or any Open Food Facts project on a public channel, introduce yourself, and suggest your own project ideas. Base any ideas you propose on a small research about the needs of a projects and/or users and make sure there is interest from a mentor before basing your application on it.

Fill out the Application

Once the application period has opened, you have to submit your application on the Google Summer of Code website. Your application must be written in English. It should contain a detailed description of your project proposal.

Copy our Google Document template and make sure you answer all of the questions.

Please be factual and clear in answering these questions. Feel free to add anything else that is relevant for your application. It is never too early to start working on your GSoC application! Note that GSoC positions are very competitive (with about 4 applicants for one position in the past) The key to creating a strong proposal is to propose a manageable and agreed-upon project, make a contribution to the module your proposal is related to, and write an application that clearly demonstrates your knowledge, skills, and enthusiasm.

Getting started now tasklist

The following things can help you select a project and prepare your idea:

  • Join our chat room
  • Introduce yourself in the #summerofcode channel
  • Join the channels related to the language/project/countries you're interested in
  • It is important to first familiarize with the software. Try to make it run on your device/machine.
  • Once you're able to run it, try one of our "good first issues" on GitHub
  • Please, go through the available tutorials and direct your questions to the Slack chat, in the most specific channel possible.
  • As you learn, it is also a good idea to propose updates to the documentation.
  • If you have not worked before with Git Branching, we encourage you to visit this web: https://learngitbranching.js.org/
  • Look at the list of all the Open Food Facts projects
  • Read the project's README on the repository, feel free to ask for necessary clarifications
  • Lurk on the project's chat channel
  • Look at the recent changes in the project's GitHub repository
  • Fill our introduction survey (it is NOT an application form)

Areas where your help can make a big difference

Open Food Facts does many things like food products data acquisition and analysis on the backend, data visualization on the Open Food Facts website and mobile apps. This page presents the major systems and the technology we use, and the current challenges you could help us address during GSOC.

You can get a full overview of our repositories on GitHub, along with hierarchized lists of impactful things you can work on. Here's our proposed selection of ideas:

 

Project 1: Implement an offline mode for the new Open Food Facts Flutter application

Description: 

Our app is used by 1 million users every month to scan food products, decrypt their labels, compare their nutritional and environmental quality, to get alerts on allergens, and to add or complete products in the Open Food Facts database.

Users often contribute or scan from the basement of supermarkets, and many places around the world do not have perfect connectivity.The purpose of this project, is to eensure the mobile application can work completely offline

You can install the new app on Android or iPhone/iPad. Note that a internal development build with the new UI is available (Android or iPhone/iPad )


Expected outcomes: 

  • Store the data of already scanned or opened products to make it available offline

  • Make it possible to edit products while offline (e.g. adding new photos): store the changes locally and synchronize them with the server when connectivity becomes available

  • Stretch goal: make it possible to preload data for the most popular products of a country

 

  • Github: openfoodfacts-dart and smooth-app

  • Slack channels: #flutter #smoothie

  • Potential mentors: Pierre Slamich, Stéphane Gigandet

  • Project duration: 350 hours

  • Skills required: Flutter, Dart

  • Difficulty rating: Medium

 

Key point: while the Open Food Facts data is a very interesting base to conduct research projects, our key goal is not only to research and train models, but to actually deploy working high precision models that make a difference. A good strategy could be to address first a subset of a problem with a solution that can be easily extended over time.

Project 2: Automatically extract nutrition facts data from photos food products

Description:

Nutrition Facts is the most useful information of food products (e.g. to compute nutritional scores such as the Nutri-Score to compare the nutritional quality of food products). It’s also the most tedious for users to enter manually, as it’s easy to make mistakes when inputting lots of numbers on a phone.


The project goal is to make it possible to automatically extract nutrition facts data from a photo of the nutrition facts table, and save our contributors thousands of hours of manual work.


It should be part of Robotoff, our system to extract valuable information out of the many images contributors send us. Robotoff uses a mix of classic techniques (such as regular expressions on the output of Google Cloud Vision OCR) and machine learning to generate “guesses” that it can apply automatically if it is confident about them, or ask users for a validation.


Expected outcomes: 

  • Create a model to automatically recognize and crop nutrition facts table from a photo of a product packaging.

  • Create a model to extract the individual nutrition facts from the nutrition facts photo

  • Create tests for the models. 

  • Integrate the model in Robotoff

Notes:

  • Accuracy is key here as this information is precise, and needs to be. In this area, better have less data, but have them right. There have been previous attempts to solve this problem but they are not operational.

  • The proposed algorithm may use OCR (we use Google Cloud Vision OCR) as part of the process. 

  • The eventual complexity lies in the layout analysis, as nutritional information is often presented as a table with multiple columns, row styles, etc. 

  • The applicant must not just concentrate on the research work, but also take care of the integration in robotoff, including eventual “business” rules to apply post extraction (on robotoff side only).

  • Technical stack is Python and Tensorflow (we use Tensorflow Serving), but other tools might be included in agreement with Open Food Facts team.

 

  • Slack channels: #robotoff

  • Potential mentors: Alex Garel, Raphaël B

  • Project duration: 350 hours

  • Github: robotoff

  • Skills required: Machine Learning, Python

  • Difficulty rating: Hard

Project 3: Pushing our Hunger Games annotation engine to the next level

Description: 

Hunger Games is the name of the web application leveraging the insights extracted from Open Food Facts crowdsourced data by its machine learning bot Robotoff, to apply insights found by Robotoff that need human validation, and to annotate new things to help train machine learning models.

It is geared towards productive contributors who want to smash a lot of work by doing a single activity effectively for some minutes or more.

Currently the Hunger Games is very rough without much explanations and onboarding for new users and activities.

Expected outcomes:

  • The purpose of this project would be to upgrade Hunger Games by providing more help to the user. 

  • This means more auto-completion and validations, more help messages during activities (tracking if it was read). 

  • It should also propose non logged-in users to log in or create an Open Food Facts account.

  • Integrate with our gamification service (openfoodfacts-events) while also defining quests and rewards, gives more stats to users, and enriching filters to help focus.

  • Another objective would be to make it usable from a mobile device so that users can use a 10 minute time gap while waiting in a queue or commuting.

  • Finally, usability and performance improvements could be implemented to make contributions even more effective.

 

  • Github: openfoodfacts-hungergames

  • Skills required/preferred: The Hunger Games application is written in Vue.js. VueJS, Javascript, possibly other languages

  • Slack channels: #hunger-games

  • Potential mentors: Alexandre Fauquette

  • Project duration: 175h or 350 hr depending on to be agreed scope

  • Difficulty rating: Easy

Project 4: Create a very simple Flutter interface to Folksonomy Engine (our Food Knowledge Graph)

Description: 

Making choices regarding food often depends on the food category considered. Some criteria are especially important when choosing a fish or a ready made meal. Showing and explaining those criteria in context (and whether the product meets them) when scanning a product is be very useful.

The Food Knowledge graph is a system to augment product data, based on a taxonomy of knowledge. This is implemented as a separate service in Python.

Expected outcomes:

  • Implement CRUD interfaces for key/values manipulation

  • Implement CRUD interfaces for property creation

  • Implement Private/Public key/value management

  • Bundle that into a UI library

  • Create a sample Flutter application making use of this library

Our challenge:

This UI is a new project, everything will need to be done: define functional specs, establish technical specs, design it, develop it, run it, document it, etc.

  • Github: Folksonomy Engine (Python/FastAPI)

  • Skills required/preferred: Flutter

  • Slack channels: #folksonomy_engine

  • Potential mentors: Christian Quest, Charles Népote

  • Project duration: 175h

  • Difficulty rating: Easy

Project 5: A public a private list engine: Open Lists

Description:

There are many things that can be done with food products data besides decrypting labels and comparing products. For instance you can track how much food you eat (and how healthy or environmentally friendly it is), keep an inventory of the food you have at home or in a small store, manage a private or shared shopping list, create virtual collections of products that share some specific traits etc. Many of those things could be done with a new Open List system that would allow people to create and manage lists of products that can be private or public, individual or shared, with an history of edits and transactions, with optional tags and an optional amount of products. Think of it as a Google Drive for food products lists!

Expected outcomes:

  • choose the backend for lists (one should consider folksonomy engine, or other alternatives)

  • Implement CRUD edition of lists and item

  • Implement addition through scan

  • Implement custom properties for lists 

  • implement offline display and editing

 

  • Github: Folksonomy Engine (Python/FastAPI)

  • Skills required/preferred: Python

  • Slack channels: #folksonomy_engine

  • Potential mentors: Christian Quest, Charles Népote

  • Project duration: 350h

  • Difficulty rating: Medium

Project 6: Setup an external authentication service

Description: 

Contributors of Open Food Facts have an account. Currently management of those accounts is directly implemented in ProductOpener. The systems lack some capabilities one would expect from a user friendly website or for user management, but it also lacks lots of capabilities for third party applications like OAuth would require.

We would like to change that to rely on an external service. However, as we really care for our user data, the service should be open source and hosted on our infrastructure.

The goal of this project is to have more features for users and app developers, while keeping product opener code less complex (separation of concerns). Also possibly have less maintenance, thanks to a complete product.

Expected outcomes:

  • This service should permit to handle authentication,

  • The mission consists in choosing such a service (eg. authentikkeycloakory, etc. ), deploying it (automating deployment), configuring and integrating it and writing documentation. 

  • You will also have to give a plan and scripts for users migration.

  • The integration part might be the most challenging part, as it should be transparent to users for simple workflow. First goal is to have authentication done through it and to provide OAuth for app developers.

  • Skills required/preferred: open to any

  • Slack channels: #product-opener

  • Potential mentors: Alex Garel, Johannes A. (hangy)

  • Project duration: 350h

  • Difficulty rating: Medium

Project 7: Build a taxonomy editor

Description: 

Taxonomies are at the heart of openfoodfacts in many aspects. It helps identify components (ingredients, labels, brands,…) and link them to useful properties, at the base of nutri-score, eco-score, allergens identification and some other properties.

Each taxonomy is a DAG (directed acyclic graph) where leaves have one or more parents. Currently the taxonomy is in a raw text file in our repository: https://github.com/openfoodfacts/openfoodfacts-server/tree/main/taxonomies.

While effective for the application, this format is quite cumbersome to edit for contributors.

We would like to have a tool (online or standalone) to edit taxonomies. 

Expected outcomes: 

The tool should:

  • help quickly find an element with a search

  • help visualize the hierarchy of components

  • help visualize the component, it’s synonyms in multiple languages

  • indicate inherited properties for an element, and signal when there are more than one

  • enable edition of those names, synonyms and properties

  • run some validation on names, synonyms and properties (no duplicate, specific formats, etc.)

As a bonus, it would be really interesting to know the impact of a modification on the application. For that we could imagine simple API’s (one for each taxonomy) on the openfoodfact application to visualize which products would be affected by a change. This feedback could be a really interesting tool to ensure no error is made (unexpected side effects)

Technologies: On the technical side, you can choose freely, in accordance with your mentor, which technology to use, still the capacity for the contributors community to maintain it in the long run is an important criteria.

Skills required/preferred: The candidate should be creative, but also have a good logical thinking capacity and also be able to track possible problems ahead. 

  • Slack channels: #taxonomies

  • Potential mentors: Stéphane Gigandet / Alexandre Garel

  • Project duration: 350h

  • Difficulty rating: Medium

Project 8: Autonomous taxonomy service

Description: 

Taxonomies are at the heart of Open Food Facts in many aspects. It helps identify components (ingredients, labels, brands,…) and link them to useful properties, at the base of nutri-score, eco-score, allergens identification and some other properties.

Each taxonomy is a DAG (directed acyclic graph) where leaves have one or more parents.

Currently each taxonomy is loaded in memory by the perl application and this takes quite a lot of memory. While most of the time the structure is processed the apposite module : Tags.pm which offers a basic API. Still there are many places where the structure is used directly.

Expected outcomes:

  • We would like to take out this part in an independant service. The mission will be to build an independant service for taxonomies. It will offer a rich API to tackle all current usage in a practical way.

  • The service will most likely rely on a database to ensure good performance and avoid reinventing the wheel. You will have to study which type of database fits best for the task.

  • You will have to understand current perl code to understand use cases for taxonomy (your mentor will help you in this task). Create a service for taxonomy with a clean API, and implement usage of the API at least for the simplest usage cases (we can replace the old code progressively).

  • Having this independant service will help improve scalability, performance, maintainability and may offer new opportunities for services around taxonomies.

Technologies: You can choose freely, in accordance with your mentor, which technology to use, still the capacity for the contributors community to maintain it in the long run is an important criteria. Some understanding of Perl is necessary.

Skills: this project is not easy to bootstrap. The candidate should be able to cope with complexity, and be able to quickly understand current code and extract possible model for it (mentor will help)

  • Slack channels: #taxonomies

  • Potential mentors: Stéphane Gigandet

  • Project duration: 350 h

  • Difficulty rating: Hard

Project 9: Build an automatic rule editing engine

Description: 

Every day, Open Food Facts receives a lot of data from different sources, but the main contribution comes from users using a smartphone and contributing photos, data, answers to simple questions, etc. While this is a really efficient way to get data, quality is an important challenge. We have contributors capable of observing data, finding anomalies and applying corrections. 

For example, one might observe that if a yogurt has sugar in it, it must belong to the “sweetened yogurt” category (and not only to the yogurt category).

Many corrections can lead to massive edits, and there is a dedicated tool. But a correction might be immediately overwritten by a new contribution. This is discouraging for advanced contributors. This one shot modification process does not permit capitalizing on fixes.

While thinking about it with some advanced contributors, we imagined a solution, which would involve While the implementation of rule resolution and application in the server code is quite easy, the hard part is to have a tool for contributors to find and test rules before capitalizing them.

Expected outcomes: 

You will have to build a tool to give advanced contributors  the ability to write rules that are applied automatically to data.

  • The tool must help users write rules (either directly or maybe with an interface) and validates their syntactic correctness.

  • It must provide a way to test the impact of a rule. The ability to cross the result (and non result) with a search, or easily test different versions of the rule, might help digging into the results and refine it.

  • The tool must provide a management of rule ownership and versioning.

  • The tool can be online or standalone. It may use the api to measure the impacts of a rule (a specific api can be developed by the team, if needed), or maybe directly the mongo database, or if needed a specific database (but synchronization would have to be considered).

  • Optionally, gamification might be considered. We have a gamification service to collect achievements, and rule impact could be registered.

  • Optionally we could consider if the rule building service might also become, in the future, the service that applies rules to incoming edits.

Skills required/preferred:  A first experience with databases is necessary, Mongo knowledge would be a plus. An understanding of things like regular expressions is also required..

  • Slack channels: #product-opener

  • Potential mentors: Alexandre Fauquette / Pierre Slamich

  • Project duration: 350h

  • Difficulty rating: Hard

Project 10: Community federation portal and gamification

Description:

Open Food Facts data is collected and curated by its community. As of today however, we lack tools to synchronize efforts in a simple and convenient way. We would also like efforts from contributors to be acknowledged and add as much fun as possible.

This project aims at building a simple website that could help track some areas of interest by contributors. Areas of interest might be a combination of a geographical area with either a product type (product category) or a brand or a shop network, etc.

Expected outcomes: 

  • The site will propose to constitute “area of interest”. For an area of interest, people should be able to “subscribe” to the group, maybe exchange messages (using existing tools, like the forum or a slack channel). 

  • The site should gather general metrics on the area of interest, as well as quality metrics (we already have quality facets). It should also propose an events stream. All these must be leverageable by users to contribute and help maintain data quality as well as completeness.

  • This should also be combined with the openfoodfacts-events service to build leaderboards for the area of interest.

Skills required/preferred: Technical stack choice is free, but must be open source and validated by the Open Food Facts team. If some APIs needs to be added to openfoodfacts, it will be taken in charge by the Open Food Facts team. Upgrades to openfoodfacts-events should be considered part of this project. 

The candidate should be able to communicate with the community, be creative and pragmatic to build a useful tool..

  • Slack channels: #gamification

  • Potential mentors: Charles Nepote / Alex Garel

  • Project duration: 350h

  • Difficulty rating: Easy



Some parts of this page have been adapted from the excellent GNOME GSOC page