Research projects in social network analysis/Wiki's
and open source software


Enrichment of coporate data with informations available on the Web is now an emerging and fast growing
research of the linked-data domain. This data can greatly improve decision making in corporations and offer
a considerable strategic advantage. Documents, videos, tv streams, Web sites, Wikis, Blogs and social
networks content analysis require mastering semantic content analysis techniques and real-time 'Big-Data' infrastructures. Here is the list of projects we are working on:

Social network and website analysis to improve stock-trading systems decisions

Design of specialized stock-market extractors that can be orchestrated easily to obtain rich information,
in real-time, that can help in the decision making process of stock trading systems. INtercangeable and interconnectable extractors all emits a simple signal: buy, sell, wait and have a context indicator.


  Companies and specialists, interested in stock trading systems, would like to improve the success rate
of their trading strategies using other information currently not easily available. For them, it is had to get,
synthetise and readily use real-time information from specialized web sites (e.g.. MarketWatch, Bloomberg,
Reuters, ...), influential traders (e.g. @PaulScolardi, @Burns277, @OptionsHawk,...), specialized databases
(c.-à-d. NASTRAQ, TAQ, OptionsMetrics,...) and complex technical analysis information  (e.g. back testing,
optimisation, scanners, alerts, personalized indicators, interfaces to Broker and real-time data collector.


             WHAT WE ARE DOING  


Standardized, interconnectable and esily scalable extractors that can easily be integrated, in real-time. Each
specialized extractor emits a simple signal (e.g. buy, sell, wait and a context indicator) from different information
sources. These signals can be usefull and enrich the strategies of their real-time stock-trading system.




               STUDENTS     Thomas Maketa, Emanuel Berndl and Thomas Weissgerber (Univ. Passau Allemagne) with the support

                                        of the
International Finance Center of YALE University. Co-directed by Maher Kooli of ÉSG.


  Anno4JAlibabaMarmotta, Camel, Hadoop, Java/Python, RabbitMQSPARQL,
                                            RDF/XML, JASON,
ElasticSearch/ Kibana


Powered by Screen-Shot-2016-12-07-at-10-42-18-AM.png


Generic statistics engine (for high frequency trading)


This project was proposed by TickSmith which is a Montreal startup specializing in Big Data
in the financial. Typical customers of TickSmith are alternative marketplace (ATS) 

and trading groups in financial institutions. Their platform can also be used by the departments of compliance and regulators.




Work with a massive amount of stock market data. The first objective aims at optimizing, scaling and generalizing,
using Scala/Spark technology, a software prototype that creates stock market statistics such as price difference curves.
The second objective is to validate the results obtained and ensure that it is easily possible to add new estimators
in the proposed prototype architecture. Eventually, a user can provide his own formulas and statistics and the engine
will launch the formulas and provide the results automatically. Finally, the third objective of this project is to create a
post implementation analysis module by adding the relevant statistics relating to the performance (ie d. Volumes,
spreads and realized volatility, slip, execution profiles , etc.).


             WHAT WE ARE DOING  


This figure shows the first version of the prorotype. This first proof of concept, general statistical formulas using a
parser based on "scala.util.parser.combinator" which creates a mathematical grammar and can express syntactical
elements of statistical formulas. It will therefore be possible to use this parser, through a user interface, so that a
user can build or adapt a formula interactively. In the second iteration, we developed the Scala code and RDD's
to execute four formulas (Volume, VWAP, VWAS and GK). Then we we have conducted large scale trials on an
Amazon cluster. Finally, during a  third iteration, we generated Latex code from Java characters chains (who contain
a formula) and we graphically represent it so that users can visually validate what they intends to calculate (see below):

  Example of simulation of the calclations on different Amazon instances

                          Example of the parallel processing efficiency using SparkUI


              STUDENTS      Philippe Grenier-Vallée and Luiz Fernando Santos Pereira


              TECHNOLOGY      Spark 2.0, Scala, Java, Scala Parser Combinators, JLatexmath, JSON, AWS EMR,
                                              Maven, BitBucket, Docker


Powered by Screen-Shot-2016-12-07-at-10-42-18-AM-(1).png



Wiki analysis (an open science project)

The current academic publication process is long, costly and often the resullt

is private.The open data movement, supported again recently by the

Union, is trying to improve this process. This project aims at

allowing to permanently evolve a publication as a living document allowing

the community of experts to debate and propose changes to the publication

making it a collective work. 



The notion of living publications where the community can contribute to an existing publication. This requires a platform as
well as a mediation process that allows a community of experts to discuss and modify an existing publication over time. Many challenges have to be adressed: who is allowed to participate and chenge the publication; how do we manage change propositions; how do we evaluate major/minor contributions; how do we recognize when a new version should be packaged;
how can contributors claim a contribution; how do we assess most influencial/reputable contributors; ect...

             WHAT WE ARE DOING  


The GRISOU research lab has offered his adapted WiKi (a MediaWiki adaptation) to experiment these concepts the
IEEE Computer Society, has accepted to conduct a trial using 4 existing IEEE Software.publications. This trial will
be an early validation of the possible mechanism to allow this collaboration. Following this trial, IEEE Software will
publish a special issue to present the findings.

You can try this out here :



Improving the precision of queries



This project looks at how the existing query engines accuracy can be improved using semantic techniques. 
Google is the world leader in this field with, for example, the Machine Translation for Query
Expansion, its snippets and Statistical Machine Translation techniques. Continue le bon
M'hammed Oulaidi's first attempt that used a multilingual thesaurus based on Ginco, that improved the
success rate. 


              TECHNOLOGY       MalletTextBlobCouchDBGoogle Translate APIWordNetNLTKStanford NLP parserSolr



Conversion to open source software



Converting your office to open source software is an emerging topic and is still a complex project. Follow the discussion
on the
ÉTS blog. Should a company, and especially a goverment organisation, renew its Microsoft licences or move to
OpenOffice?  How feasable is this project today? Is the licence cost and migration costs the only costs that should be
considered in such a project? Reasearch in this area is trying to understand how these projects are done and how they
address the many challenges. (
overview of the conversion project (in French).

Conversion guides (in French): Open source replacement products identification, impact analysis
and project migration control.

Exemple (In French) identification study;

  - Open source alternatives for
supporting software development (in French)