Data Science Master Thesis

Internship, Stockholm

Here at Normative we’re automating the sustainability reporting of big companies, providing stakeholders with crucial information and giving companies the incentives and the guidance required to drastically reduce their environmental footprint. We’ve been recognised by Norrsken Foundation as one of the most impactful startups in the Nordics, and over the last year we’ve had a revenue growth of >70% showcasing our future potential to help all major corporations with their sustainability reporting – and now we need your help!

We have four master thesis suggestions:

1. Visualize Trade Flow and Life Cycle Assessment Value Chain Data


Each product that we consume has a journey. For instance, the raw materials inside the computers we buy might come from the mines of DRC, shipped to factories in Germany, assembled in China, and shipped to retail warehouses in Sweden.

Each of these trade flows can be analysed using so called input-output data, quantifying the monetary flows. Basically tracking how much from buying 10,000 SEK worth of goods goods in the Swedish electronics sector, ends up in other industries across the world (e.g.the mining sector in DRC). These trade flows can also be used for calculating the environmental and social impact and risks from a good or service.

The thesis will be about visualising and analyzing trade flow data, and the environmental and social cost of global trade. One hypothesis can be “European companies are getting more environmentally effective because they are exporting their emissions”.

2. Machine Learning for Automatic Life Cycle Assessment


When you produce any type of product, such as a laptop, car or an apple pie, you usually start with a so called bill-of-materials (BOM). This BOM contains a specification of the various sub-components or materials used for creating the product. For instance a BOM for baking a cake is simply the list of ingredients you need to bake the cake.

Inside of Normative we have a huge database containing data on the environmental impact of tens of thousands of goods, services, components, materials etc (e.g. the kg CO2 emissions per kg of flour used, or the land use impact in m2 from consuming a liter of milk).

Using data science and machine learning, it would be possible to automatically create an environmental assessment in Normative of a product before it is produced. By simply sending a BOM as an input to an API, matching the queries with the appropriate environmental data. The thesis will be about enabling this by mapping BOM with environmental data with the help of machine learning. You will train the model at the connection between machine learning and BOM.

3. Analysis of Global Corporate Carbon Reporting


Corporate carbon reporting has gained a lot of traction in recent years. It is widely recognized that if we are not able to measure corporate emissions and emissions targets, international climate agreements such as Paris will be extremely hard to follow. Normatives’s core product is an accounting tool that algorithmically generate carbon reporting in an equally reliable way as financial reports are generated by normal accounting software.

However, most companies do not use any tools like Normative at all. Hence, there are no methods or mechanisms in place to check whether or not corporate carbon reporting is credible.

This project aims to develop a statistical methodology/model to assess the validity of claimed carbon emissions in corporate reports, where no tool like Normative has been used. To create this methodology, Normative has for this specific project created two databases.

Database 1 contains claimed emissions for 2000 companies that are not Normative customers
Database 2 contains total emissions of over 160 industry sectors in all major countries, as measured by statistical agencies all across the world.
Database 1 can be seen as being more “bottom-up” and database 2 can be seen as being “top down”. Moreover, database 2, unlike database 1, is actually checked against emissions actually measured in the atmosphere which makes it a lot more reliable, and can thus be used to benchmark the reliability of the claims in database 1.

This thesis will be about how much enterprises “under report” compared to the actual global measures. If successful, we believe that this is a project that could gain a lot of international media attention, since it would for the first time be possible to detect and quantify underreporting and possible green washing.

4. Extract sustainability information from unstructured data

#Data scraping
#Machine learning

A contributing factor that makes carbon accounting difficult for companies, is the fact that a lot of important corporate sustainability data, used for estimating corporate carbon footprints, is unfortunately stored in unstructured .pdf files. Examples of such files primarily include:

Corporate sustainability reports, such as this one.
Utility invoices, such as this one.
What makes this problem extra challenging is the fact that each .pdf file is structured in an unique way. Possible approaches include parsing the file, calculating probability metrics based on keywords such as “kWh”, “CO2”, “GHG”, etc. Or training a machine learning model based on our labeled database of training data. The data we have at our disposal are:

10,000 .pdf files
A labeled database of training data consisting of thousands of data points we want to extract from the files.
A potential research question can be is it possible to apply machine learning to “extract key metrics from corporate reports”.

5. Original Idea

If you somehow have your own idea related to what we do, sustainability calculation, visualization and reporting, then we’re open to listening. This could include a combination of some of the suggestions above.


Your background is likely in data science with a strong interest in optimization problems and modelling.

Next step

If this sounds interesting to you, apply below with CV as well as motivation why you want to take on project 1, 2, 3, 4 or one of your own ideas.

We adopt a continuous selection process, deadline for applying is Wednesday the 27th of November 23.59 CEST.

We will hold a webinar regarding the thesis topics in the beginning of December. More information will come. If you have any questions, reach out to