Mostly Linguistically Computational

To content | To menu | To search

Friday, September 30 2011

Improvement and correction over last post

I made a big mistake in my last post about our results improvements, the precision/recall curves of our experiments were in "gain" mode, that is, the curve tend toward zero as it approach the precision of a random selection. As a result, instead of finishing at about 5% precision (for a 20 classes problem with balanced classes, that's what one should obtain). So our actual result are even slightly better than what you saw in my previous post.

Continue reading...

Thursday, September 22 2011

Recent NC-ISC method improvement

Despite the already good results of our method, we have further improved it. The most notable improvement is due to a better handling of the numeric compatibility between the two final embeddings (for instance words and documents embedding).

Continue reading...

Thursday, April 28 2011

eXenSa is born !

I had a few busy days lately, eXenSa is officially created since the 3rd of April. As I've quickly sketched before, eXenSa will propose automated product recommendations to e-commerce site. The most important innovation is the fact that, by using and learning semantics from several e-shops, we will be able to recommend stuff with few data from user actions. As a consequence our market target will include much smaller e-shops than other recommendations providers (this in addition with the excellent quality of our recommendation engine, of course). The current company website is a bit skinny yet, more to come soon, of course.

Tuesday, December 14 2010

Evolution of email

As a followup to my previous post about the unification of email and social medias, here is an interesting infographic about the history of the thing, which basically shows one thing : it has not evolved a lot.

Friday, December 10 2010

Competition in the recommendation area

Recommendation engines are nothing new. Many algorithms can be used to obtain results of variable quality, and a lot of people jump in the ship (including myself). Now that the eCommerce market is really mature, and now that anyone with a bit of technical knowledge and a lot of will can start an eShop, everybody is going to look after systems that they can plug into their shop in order to obtain better product recommendations.

Continue reading...

Recommendation system : a quick overview of the problem and its relation with NC-ISC

As a followup to my researches on documents and / or vocabulary characterizing, we are preparing (some friends and myself) to launch a new recommendation service. Our first targets are eCommerce sites that can greatly benefit some improvement over the products they show to the users (wether it's in the recommended items list or in the display order of a search query). Amazon reportedly has a 30% share on its sales that directly comes from recommended items (and their recommendations are not unanimously aclaimed). Now the question will be : why will my algorithm be better than the others ? and why should you use it.

Continue reading...

Monday, December 6 2010

Alternative to mail, blogging, micro-blogging, multi-facets publishing, chatting, etc.

So after that yesterday post about an open protocol that could cover all of our social needs while preserving privacy, I've thought a lot. One thing that stroke me is how much the concept of blogging/publishing/chatting is the opposite of how the mail system is made. This brings me to some interesting ideas (or not, you tell me)

Continue reading...

Diaspora the distributed social networking system

After I wrote my last post, I immediately thought "Darn, I've forgotten to talk about Diaspora". Fortunately, nobody has flamed me yet...

Continue reading...

Sunday, December 5 2010

Standard protocol for Social Networking

I was recently wondering about the fact that we where at the very beginning of the social networking era. And I was thinking to myself that, like HTTP vs. hypercard, the only way to deal openly with social groups real-time sharing needs is to develop an open protocol that would allow the development of social accounts (in the very same way that you have mail accounts) that you could easily create and control.

I believe we really are at the pre-standard age of the phenomenon (or should I say, the completion of the phenomenon, since emails and IRC where bits of our social life).

Continue reading...

Thursday, September 16 2010

Performance of NC-ISC on Ohsumed

Ohsumed is a well known dataset consisting of 34389 documents classified among 23 classes (each document can belong to several classes). The documents use a vocabulary of 30689 words. Dataset split is a random 80/20%. I've used the dataset provided at the Alessandro Moschitti's corpora webpage, and more exactly the preprocessed data available on the Rate Adapting Poisson model site. Please note that I have not managed to reproduce the results (for LSI and TF-IDF) from The Rate Adapting Poisson (RAP) model for Information Retrieval and Object Recognition - Peter V. Gehler, Alex D. Holub and Max Welling, ICML 2006. A short email discussion with P. Gehler did not help much to understand the source of the difference. Anyway, the most interesting stuff reside in the difference between the raw data (TF-IDF) and the processed data.

Continue reading...

My other neighbors

Ok, I've got a lot of things to show you, but unfortunately, not muchtime to prepare the data. Here is another comparison between LSA and my method (NC-ISC), for the neighbors of the word "200". I know it's not really a word, but at least it is easier to sort out the good neighbors from the bad ones with this kind of word, right ?

Continue reading...

Friday, July 23 2010

My (1024) neighbours are nicer than yours :)

In this post I show you a few (sort of) neighbours of "yellow". I compare the neighbourhood of the word vectors obtained from LSI (or more precisely my fast variant of LSI) and using my method "NC-ISC". The original cooccurrence matrix has been obtained from a merge of Europarl corpus and the BNC. No preprocessing have been applied to the corpus, not even lowercasing.

Continue reading...

Tuesday, July 20 2010

Visualize methods performance

As I wrote in my last post, my methods improves things a lot on the datasets I've tried. The most impressive in on 20 newsgroups, mainly because I can use the full, unaltered original data easily (and thus that's what I do).

Continue reading...

Saturday, July 17 2010

Retrospective January 2010 - June: Bounce back

January 2010 has been a sleepless month. I had to give a hand to the team working on a virtual show web application. The project was late and missing the deadline was just unthinkable.. you know what it means : all-nighters, bad mood and a terrible backdraft right after. Then on February, my wife gave birth to my daughter - more joy, love and happiness ensued. Fortunately baby Ismalle had not been too much of a threat on my nights, It was enough, though, to prevent me to do anything related to my brain (I mean intellectual activity, right ?) during evenings and week-ends for at least 3 months.

Continue reading...

Thursday, June 24 2010

Retrospective March 2009 - December 2009, I'm losing it

While developping my method on GPU, I had to go through a lot a matrix-related stuff, various kinds of factorization methods, and it finally occur to me that, maybe, my fantastic method wasn't new at all, after all. Slowly, little by little, I discovered bits of things ressembling it, and while I was still almost certain that no one in the Computational Semantics had proposed it, I was starting to believe that I had just rediscovered an old method.

Continue reading...

Retrospective April 2008- February 2009, first moments

So I've spent the last two years doing a lot of things, let's sum it up for those who want to know the details. In July 2008 I left the LIC2M (CEA-LIST) and joined Epiphyte, a company run by three friends of mines, who were kind enough to host me and let me do my research on the promising idea I had just a few weeks before I left the LIC2M. I still don't really know what drove them to offer me such a nice situation, but I hope they didn't and won't regret it. Anyway, I started developping my idea.

Continue reading...