I made a big mistake in my last post about our results improvements, the precision/recall curves of our experiments were in "gain" mode, that is, the curve tend toward zero as it approach the precision of a random selection. As a result, instead of finishing at about 5% precision (for a 20 classes problem with balanced classes, that's what one should obtain). So our actual result are even slightly better than what you saw in my previous post.
Friday, September 30 2011
Improvement and correction over last post
By Guillaume Pitel on Friday, September 30 2011, 11:22 - nc-isc
Thursday, September 22 2011
Recent NC-ISC method improvement
By Guillaume Pitel on Thursday, September 22 2011, 22:07 - machine learning
Despite the already good results of our method, we have further improved it. The most notable improvement is due to a better handling of the numeric compatibility between the two final embeddings (for instance words and documents embedding).
Thursday, April 28 2011
eXenSa is born !
By Guillaume Pitel on Thursday, April 28 2011, 21:58 - recommendation
I had a few busy days lately, eXenSa is officially created since the 3rd of April. As I've quickly sketched before, eXenSa will propose automated product recommendations to e-commerce site. The most important innovation is the fact that, by using and learning semantics from several e-shops, we will be able to recommend stuff with few data from user actions. As a consequence our market target will include much smaller e-shops than other recommendations providers (this in addition with the excellent quality of our recommendation engine, of course). The current company website is a bit skinny yet, more to come soon, of course.
Tuesday, December 14 2010
Evolution of email
By Guillaume Pitel on Tuesday, December 14 2010, 21:51
As a followup to my previous post about the unification of email and social medias, here is an interesting infographic about the history of the thing, which basically shows one thing : it has not evolved a lot.
Friday, December 10 2010
Competition in the recommendation area
By Guillaume Pitel on Friday, December 10 2010, 16:23 - recommendation
Recommendation system : a quick overview of the problem and its relation with NC-ISC
By Guillaume Pitel on Friday, December 10 2010, 11:47 - recommendation
As a followup to my researches on documents and / or vocabulary characterizing, we are preparing (some friends and myself) to launch a new recommendation service. Our first targets are eCommerce sites that can greatly benefit some improvement over the products they show to the users (wether it's in the recommended items list or in the display order of a search query). Amazon reportedly has a 30% share on its sales that directly comes from recommended items (and their recommendations are not unanimously aclaimed). Now the question will be : why will my algorithm be better than the others ? and why should you use it.
Monday, December 6 2010
Alternative to mail, blogging, micro-blogging, multi-facets publishing, chatting, etc.
By Guillaume Pitel on Monday, December 6 2010, 22:24 - social networking
So after that yesterday post about an open protocol that could cover all of our social needs while preserving privacy, I've thought a lot. One thing that stroke me is how much the concept of blogging/publishing/chatting is the opposite of how the mail system is made. This brings me to some interesting ideas (or not, you tell me)
Diaspora the distributed social networking system
By Guillaume Pitel on Monday, December 6 2010, 21:45 - social networking
After I wrote my last post, I immediately thought "Darn, I've forgotten to talk about Diaspora". Fortunately, nobody has flamed me yet...
Sunday, December 5 2010
Standard protocol for Social Networking
By Guillaume Pitel on Sunday, December 5 2010, 22:41 - social networking
I was recently wondering about the fact that we where at the very beginning of the social networking era. And I was thinking to myself that, like HTTP vs. hypercard, the only way to deal openly with social groups real-time sharing needs is to develop an open protocol that would allow the development of social accounts (in the very same way that you have mail accounts) that you could easily create and control.
I believe we really are at the pre-standard age of the phenomenon (or should I say, the completion of the phenomenon, since emails and IRC where bits of our social life).
Thursday, September 16 2010
Performance of NC-ISC on Ohsumed
By Guillaume Pitel on Thursday, September 16 2010, 16:39 - nc-isc
Ohsumed is a well known dataset consisting of 34389 documents classified among 23 classes (each document can belong to several classes). The documents use a vocabulary of 30689 words. Dataset split is a random 80/20%. I've used the dataset provided at the Alessandro Moschitti's corpora webpage, and more exactly the preprocessed data available on the Rate Adapting Poisson model site. Please note that I have not managed to reproduce the results (for LSI and TF-IDF) from The Rate Adapting Poisson (RAP) model for Information Retrieval and Object Recognition - Peter V. Gehler, Alex D. Holub and Max Welling, ICML 2006. A short email discussion with P. Gehler did not help much to understand the source of the difference. Anyway, the most interesting stuff reside in the difference between the raw data (TF-IDF) and the processed data.
My other neighbors
By Guillaume Pitel on Thursday, September 16 2010, 15:29 - nc-isc
Ok, I've got a lot of things to show you, but unfortunately, not muchtime to prepare the data. Here is another comparison between LSA and my method (NC-ISC), for the neighbors of the word "200". I know it's not really a word, but at least it is easier to sort out the good neighbors from the bad ones with this kind of word, right ?
Friday, July 23 2010
My (1024) neighbours are nicer than yours :)
By Guillaume Pitel on Friday, July 23 2010, 21:47 - nc-isc
Tuesday, July 20 2010
Visualize methods performance
By Guillaume Pitel on Tuesday, July 20 2010, 14:37 - nc-isc
As I wrote in my last post, my methods improves things a lot on the datasets I've tried. The most impressive in on 20 newsgroups, mainly because I can use the full, unaltered original data easily (and thus that's what I do).
Saturday, July 17 2010
Retrospective January 2010 - June: Bounce back
By Guillaume Pitel on Saturday, July 17 2010, 16:14
January 2010 has been a sleepless month. I had to give a hand to the team working on a virtual show web application. The project was late and missing the deadline was just unthinkable.. you know what it means : all-nighters, bad mood and a terrible backdraft right after. Then on February, my wife gave birth to my daughter - more joy, love and happiness ensued. Fortunately baby Ismalle had not been too much of a threat on my nights, It was enough, though, to prevent me to do anything related to my brain (I mean intellectual activity, right ?) during evenings and week-ends for at least 3 months.
Thursday, June 24 2010
Retrospective March 2009 - December 2009, I'm losing it
By Guillaume Pitel on Thursday, June 24 2010, 22:03
While developping my method on GPU, I had to go through a lot a matrix-related stuff, various kinds of factorization methods, and it finally occur to me that, maybe, my fantastic method wasn't new at all, after all. Slowly, little by little, I discovered bits of things ressembling it, and while I was still almost certain that no one in the Computational Semantics had proposed it, I was starting to believe that I had just rediscovered an old method.
Retrospective April 2008- February 2009, first moments
By Guillaume Pitel on Thursday, June 24 2010, 21:37
So I've spent the last two years doing a lot of things, let's sum it up for those who want to know the details. In July 2008 I left the LIC2M (CEA-LIST) and joined Epiphyte, a company run by three friends of mines, who were kind enough to host me and let me do my research on the promising idea I had just a few weeks before I left the LIC2M. I still don't really know what drove them to offer me such a nice situation, but I hope they didn't and won't regret it. Anyway, I started developping my idea.