Enrichment of coporate data with informations available on the Web is now an emerging and fast growing research of the linked-data domain. This data can greatly improve decision making in corporations and offer a considerable strategic advantage. Documents, videos, tv streams, Web sites, Wikis, Blogs and social networks content analysis require mastering semantic content analysis techniques and real-time 'Big-Data' infrastructures. Here is the list of projects we are working on: Social network and website analysis to improve stock-trading systems decisions
CHALLENGE
WHAT WE HAVE DONE
Standardized, interconnectable and esily scalable extractors that can easily be integrated, in real-time. Each specialized extractor emits a simple signal (e.g. buy, sell, wait and a context indicator) from different information sources. These signals can be usefull and enrich the strategies of their real-time stock-trading system.
STUDENTS Thomas Maketa, Emanuel Berndl and Thomas Weissgerber (Univ. Passau Allemagne) with the support of the International Finance Center of YALE University. Co-directed by Maher Kooli of ÉSG.
TECHNOLOGY Anno4J, Alibaba, Marmotta, Camel, Hadoop, Java/Python, RabbitMQ, SPARQL, RDF/XML, JASON, ElasticSearch/ Kibana
This project was proposed by Revelate which is a Montreal startup specializing in Big Data in the financial. Typical customers of Revelate are alternative marketplace (ATS) and trading groups in financial institutions. Their platform can also be used by the departments of compliance and regulators.
Work with a massive amount of stock market data. The first objective aims at optimizing, scaling and generalizing, using Scala/Spark technology, a software prototype that creates stock market statistics such as price difference curves. The second objective is to validate the results obtained and ensure that it is easily possible to add new estimators in the proposed prototype architecture. Eventually, a user can provide his own formulas and statistics and the engine will launch the formulas and provide the results automatically. Finally, the third objective of this project is to create a post implementation analysis module by adding the relevant statistics relating to the performance (ie d. Volumes, spreads and realized volatility, slip, execution profiles , etc.).
This figure shows the first version of the prorotype. This first proof of concept, general statistical formulas using a parser based on "scala.util.parser.combinator" which creates a mathematical grammar and can express syntactical elements of statistical formulas. It will therefore be possible to use this parser, through a user interface, so that a user can build or adapt a formula interactively. In the second iteration, we developed the Scala code and RDD's to execute four formulas (Volume, VWAP, VWAS and GK). Then we we have conducted large scale trials on an Amazon cluster. Finally, during a third iteration, we generated Latex code from Java characters chains (who contain a formula) and we graphically represent it so that users can visually validate what they intends to calculate (see below):
Example of simulation of the calclations on different Amazon instances Example of the parallel processing efficiency using SparkUI
STUDENTS Philippe Grenier-Vallée and Luiz Fernando Santos Pereira
TECHNOLOGY Spark 2.0, Scala, Java, Scala Parser Combinators, JLatexmath, JSON, AWS EMR, Maven, BitBucket, Docker
Powered by
Wiki analysis (an open science project) The current academic publication process is long, costly and often the resullt is private.The open data movement, supported again recently by the European Union, is trying to improve this process. This project aims at allowing to permanently evolve a publication as a living document allowing the community of experts to debate and propose changes to the publication making it a collective work.
The notion of living publications where the community can contribute to an existing publication. This requires a platform as well as a mediation process that allows a community of experts to discuss and modify an existing publication over time. Many challenges have to be adressed: who is allowed to participate and chenge the publication; how do we manage change propositions; how do we evaluate major/minor contributions; how do we recognize when a new version should be packaged; how can contributors claim a contribution; how do we assess most influencial/reputable contributors; ect...
WHAT WE HAVE DONEG
The GRISOU research lab has offered his adapted WiKi (a MediaWiki adaptation) to experiment these concepts the IEEE Computer Society, has accepted to conduct a trial using 4 existing IEEE Software.publications. This trial will be an early validation of the possible mechanism to allow this collaboration. Following this trial, IEEE Software will publish a special issue to present the findings. You can try this out here : www.grisouwiki.org
Improving the precision of queries
This project looks at how the existing query engines accuracy can be improved using semantic techniques. Google is the world leader in this field with, for example, the Machine Translation for Query Expansion, its snippets and Statistical Machine Translation techniques. Continue le bon M'hammed Oulaidi's first attempt that used a multilingual thesaurus based on Ginco, that improved the success rate.
TECHNOLOGY Mallet, TextBlob, CouchDB, Google Translate API, WordNet, NLTK, Stanford NLP parser, Solr
Conversion to open source software
Converting your office to open source software is an emerging topic and is still a complex project. Follow the discussion on the ÉTS blog. Should a company, and especially a goverment organisation, renew its Microsoft licences or move to OpenOffice? How feasable is this project today? Is the licence cost and migration costs the only costs that should be considered in such a project? Reasearch in this area is trying to understand how these projects are done and how they address the many challenges. (overview of the conversion project (in French). Conversion guides (in French): Open source replacement products identification, impact analysis and project migration control. - Exemple (In French) identification study; - Open source alternatives for supporting software development (in French)