University of OuluCenter for Ubiquitous Computing
.01

HOME

PERSONAL DETAILS
Center for Ubiquitous Computing, P.O.Box 4500, FIN-90014, Finland
mapiconimg
panos.kostakos@oulu.fi
+358 505 950 718
Hello. I am passionate about Big Crime and Big Data . Welcome to my Personal and Academic profile.

BIO

ABOUT ME

Author of various articles on illegal arms trade, cocaine smuggling, illegal migration and terrorism. I am currently a postdoc researcher in the Center for Ubiquitous Computing, University of Oulu, Finland. My research is focused on the development of new computational methodologies and their application to organised crime, terrorism, and corruption.
Online Viz

Topic Modelling on mafia related news articles

Named Entity Recognition (NER)

Machine Perception of the Mafia

CUTLER data flow with Sankey Particles

.02

H2020 Projects

FUNDED PROJECTS

CUTLER 2018-2020

H2020

GRAGE 2014-2018

H2020

OVERVIEW

CUTLER - Coastal Urban Development Through The Lenses of Resiliency

H2020-CO-CREATION-2017

Coastal urban development incorporates a wide range of development activities that are taking place as a result of the water element existing in the fabric of the city. This element may have different forms (i.e. a bay, a river, or a brook) but in almost all cases the surrounding area constitutes what maybe considered as the heart of the city.

Every city that incorporates the water-element in its fabric is confronted with the fundamental requirement of developing policies for driving development in the surrounding area, while balancing between: a) economic growth, b) protection of the environmental, and c) safeguarding social cohesion. This requirement is tightly connected with the concept of Urban Resilience, which is the capacity of individuals, communities, businesses and systems within a city to survive, adapt and grow no matter what chronic stresses and acute shocks they experience.

In developing policies that add value to the resilience of a city, we shift the existing paradigm of policy making, which is largely based on intuition, towards an evidence-driven approach enabled by big data. Our attention is placed on policies related to the water element. Our basis is the sensing infrastructures installed in the cities offering demographic data, statistical information, sensor readings and user contributed content forming the big data layer. Methods for big data analytics are used to measure the economic activity, assess the environmental impact and evaluate the social consequences. The extracted pieces of evidence are used to inform, advice, monitor, evaluate and revise the decisions made by policy planners.

Finally, effective policies are developed dealing with: a) the economic and urban development of Thermaikos Bay, Thessaloniki, b) the transformation of Düden Brook into a recreation and park area, Antalya, c) the development of a Storm Water Plan, Antwerp, and d) the review of the Country Development Plan in the River Lee territory, City of Cork.

GRAGE - Grey and green in Europe: elderly living in urban areas

H2020-MSCA-RISE-2014

The EU has to face many challenges in achieving a more balanced regional development and sustainable economic recovery. Many of those challenges have to do with the ageing population trend, urbanization and environment under distress. More liveable and efficient communities is a target to be reached in Europe, where the “silver hair” trends can become a challenging opportunity, from a social, economic and cultural perspective. Despite those challenges are strongly interlinked, solutions provided in urban contexts not often pay due attention to the social process underlying urban trends and to the needs and behaviour of elderly citizens.

GRAGE intends to contribute to fill this gap, developing winning ideas to promote an active, harmonious and inclusive citizenship for elderly people living in urban contexts. The consortium gathers ground-breaking expertise from different scientific background (legal, economic, humanities, engineering), from academic and non academic institutions, belonging to several countries (from EU and Ukraine). Using a mix of methodologies, the research and innovation programme of the project will evolve around the idea of citizenship as a collector of interest, healthy environment and suitable urban solutions for an aging society. Main themes will be: green buildings, food and urban agriculture, information and language technology. Researcher will analyze their role in transforming cities in environments that support green and healthy lifestyles for elderly people. GRAGE intents to boost dialogue through Europe, both strengthening the academic and non-academic collaboration and a practical understanding of elderly living across Europe. Such a cooperation can have a series of returns for Europe, ranging from a more effective solution to strategic challenges (sustainable cities and demographic change) to new business opportunities for European firms, offering solutions and products for smart/inclusive/ageing societies at global level.

Project ID: 645706

Funded under: H2020-EU.1.3.3. - Stimulating innovation by means of cross-fertilisation of knowledge

Total cost: EUR 828 000

.03

CURRENT RESEARCH

FORTHCOMING PAPERS
Face DetectionForensicsOnline Behaviour

Detecting the Age of Twitter Users

Detecting the Age of Twitter Users

CorporaForensics

Paraphrasing detection

Paraphrasing detection

About The Project

On Web Based Sentence Similarity for Paraphrasing Detection

Mourad Oussalah and Panos Kostakos Center for Ubiquitous Computing, University of Oulu, P.O.Box 4500, FIN-90014, Oulu, Finland

Semantic similarity measures play vital roles in information retrieval, natural language processing and paraphrasing detection. With the growing plagiarisms cases in both commercial and research community, designing efficient tools and approaches for paraphrasing detection becomes crucial. This paper contrasts web-based approach related to analysis of snippets of the search engine with WordNet based measure. Several refinements of the web-based approach will be investigated and compared. Evaluations of the approaches with respect to Microsoft paraphrasing dataset will be performed and discussed.

To appear: KDIR 2017, International Conference on Knowledge Discovery and Information Retrieval.

Online BehaviourProstitution

Early detection of individuals at risk.

Early detection of individuals at risk.

About The Project

Early detection of individuals at risk of being drawn into online sex trade: A mixed method approach using covert online ethnography, SNA and machine learning.

by: Panos Kostakos, University of Oulu; Lucie Špráchalová, Charles University; Mourad Oussalah, University of Oulu.

How can we identify individuals at risk of being drawn into online sex trade? Recent research shows that technology enables a greater number of individuals to be involved in illicit sex markets. Because technology reduces transaction costs and breaks-down market entry barriers, the number of open- air and indoors sex workers has in the past years increased very rapidly. This has far reaching implications for economic development, social cohesion, and public health. As a result, there is urgent need for tools that prevent the spread of illegal sex trade online. In this paper, we present work in progress of a tool that uses social network data to enable early detection of individuals at risk of being drawn into sex trade online. Our method can be summarised as follows. First, we extracted users’ profiles (N=28,000) from an online European adult forum. Second, we conducted covert online ethnography and carried out interviews with a random sample of the users. This enabled us to develop a user typology that highlights the social organisation of the illicit market in conjunction with self-reported data about the risk of exposure to illicit activities. Third, we used graph theory to analyse the structural position of users. Finally, we used machine learning to train a model that predicts the risk and social role of individual users within the network.

To appear: Illicit Networks Workshop, Adelaide, Australia, 11-12 December 2017.

Online BehaviourSmuggling

Predicting Refugee flows with Google Trends

Predicting Refugee flows with Google Trends

About The Project

Correlating refugee flows with Internet search data

by: Panos Kostakos, University of Oulu; Simo Hosio, University of Oulu; Daniela Irrera, University of Catania; Christoph Breidbach, University of Melbourne; Vassilis Kostakos, Melbourne, Australia.

Can Internet search data be used as a proxy to monitor refugee mobility? Thousands of refugees and migrants cross the Mediterranean Sea to reach various parts of Europe every year. Evidently, the increasing number of irregular crossings is also leading to more deaths. The soaring refugee death toll creates an urgent need for novel tools that monitor and forecast refugee flows. Because existing monitoring systems rely extensively on international and regional human networks, there is a lack of tools that forecast refugee mobility patterns with hyperlocal precision. As a result, local authorities and search and rescue (SAR) organisations cannot deploy resources timely and effectively to manage the risks of irregular border-crossing. This study investigates the correlation between refugee mobility data (arrival dates) and Internet search data from Google Trends. Google Trends is a freely accessible tool that provides access to Internet search data by analysing a sample of all web queries submitted by end-users to Google. This online tool has already been used to study a variety of phenomena including suicide, drug use, unemployment, drug crime, and Influenza, to name a few. In our method, we carried out interviews with end-user organisations and a survey with refugees in Greece (entry point) and Finland (destination point) to identify what search queries they have used in every leg of their journey. Next, we conducted time series analysis on Google search data to investigate whether interest in user- defined and/or generic search queries correlate with levels of refugee arrival dates recorded by UNHCR and SAR organisations. Finally, Pearson’s correlation coefficients were calculated as a measure of association between refugee arrival dates, SAR data, and internet search trends.

To appear: Illicit Flows Workshop, Adelaide, Australia, 13 December 2017.

.05

STUDENTS

VISITING RESEARCHERS

OLGA KIRIAKOULI

HAROKOPIO UNIVERSITY

Luci Špráchalová

Charles University

Lu Jiyan (陆积堰)

NorthWestern Polytechnical University

Olga Kiriakouli

Twitter Data Analytics, Visualisation, Online tools

I have studied Informatics and Telematics (BSc.), and majored in web services management workflow mechanisms. I have worked as project manager for the Data Digitization of Municipality of Eleusis (Greece), web developer in various companies, and IT Support Specialist and General Officer at Manpower Agency of Greece. I am doing my MSc in Web Engineering at Harokopio University of Athens where I focus on analyzing social networking services.

Luci Špráchalová

Online ethnography, sub-cultures, illicit online sex work

I studied B.A. Social Pathology and Prevention, M.A. Social Pedagogy. My final theses were both about risky sexual behaviour and sexual self-perception and perception of normality. I worked as a free-time pedagogue and social worker - I worked with children from risky areas and poor background. I focused also on sexual education there. Now I am a project manager for a research project which is dedicated to university students, their life aspiration, and study motivation, I am an assistant at the Department of Social Pathology and Sociology, Pedagogical Faculty - I teach ethics, sociology and philosophy (all basic courses). I do my PhD in Sociology at Charles University in Prague (I have just finished my second year) and I focus on sexual minorities, communities/subcultures, especially the BDSM community online.

Lu Jiyan

Vocabulary Memory Based on Semantic and Phonetic Word Associations

I’m in my fourth year of Bachelor Degree in NorthWestern Polytechnical University in China. My bachelor thesis proposes an effective way to discover and memorize new English vocabulary based on both semantic and phonetic associations. The method proposed aims to automatically find out the most associated words of a given word. The measurement of semantic association was achieved by calculating cosine similarity of two word vectors, and the measurement of phonetic association was achieved by calculating the longest common subsequence of phonetic symbol strings of two words. Finally, the method is implemented as a web application.
.06

STUDENT PROJECTS (NLP)

AGE DETECTION

Building age detection algorithms from short texts.

Students: Benedikt Putz and Gorka Urbizu

The problem of identifying the age of authors based on their use of language is a major scientific challenge. Age awareness tools are valuable in overcoming various forensic linguistics challenges like threatening letters, ransom demands, radicalisation, and pedophilia. A major limitation to overcome is the lack of data. In most cases, authorities have access only to fragmented piece of written text. The aim of this project is to build models that predict the age of users based on their language-usage in Twitter. The novelty of our approach is the use of  approximate ager recognition software in combination with custom Natural Language Processing tools.

Datasets: Twitter historical archive of Dutch Twitter users (10M tweets).

 

Crime Sensing

Crime sensing using data from hotel reviews in London, UK. 

Student: Niko Leinonen

Tourists are easy targets for theft and victimisation. For this reason, they are much more likely to remain alert, attentive, and vigilant to suspicious activities. Tourists are also more likely to report criminal incidents to the police, friends, and social media. In this project, you will develop a crime sensing tool using textual data from hotel reviews and police-recorded crime incidents from the wider London Metropolitan Area. Can hotel reviews be used as a proxy for sensing criminal activities? Can we use reviews to predict criminal behaviours in the wider vicinity of lodges? How does crime rates impact on local businesses? How can we sense Victimisation Sentiments using customers’ reviews?

Available datasets: i) Historical archive of hotel reviews from London, ii) Data.police.uk.

 

Terrorism & Echo Chambers

Meta-Terrorism: identifying linguistic patterns in public discourse after an attack.

Students: Markus Nykanen and Mikael Martinviita, University of Oulu.

Terrorist events send shockwaves across communities and often lead to group polarization and community conflict. Open source communication data collected from social media channels can shed light in the pull and push mechanisms that drive group polarisation—e.g. echo chambers. Moreover, social media data can be used to study violence escalation and de-escalation. For example, previous research, has used the 2013 murder of Fusilier Lee Rigby in Woolwich as a case study, to show that social media communication data can illuminate the inter- and intra-community conflict dynamics arising in the aftermath of violent terrorist-like events. This project identifies linguistic features that drive the nascence and formation of echo chambers in the public sphere following a terrorist attack.

Available datasets: Historical archive of 20M tweets.

 

Corruption Detection

Catchem: A Browser Plugin for the Panama Papers using Approximate String Matching

Students: Miika Moilanen and Arttu Niemelä, University of Oulu.

Abstract— The Panama Papers is a collection of 11.5 million leaked records that contain information for more than 214,488 offshore entities. In this paper, we present work in progress on a web browser plugin that detects company names from the Panama Papers and alerts the user by means of unobtrusive visual cues. We compare company names from the Public Works and Government Services Canada (PWGSC) against the Panama Papers using three differ string matching methods. Monge-Elkan gives the best match results, but is much slower than the other algorithms. Levenshtein gives reasonably good match results and is also fast. Jaccard is fast, but matching performance is very poor, if the names are modified even a little.

Dataset: Panama Papers and PWGSC.