The CICLing verifiability, reproducibility, and open source policy was largely inspired by Ted Pedersen's paper Empiricism is not a matter of faith, as well as discussions with various attendees of CICLing 2010 in Iași.
See also FAQ and the information on your conference's page under the Software section.
Are
we doing science or faith?
To err is
human. What is more, getting a paper published translates into better university
evaluation and eventually into money. Therefore, in principle a paper may report
results that are erroneous, falsified, or made up. However, reviewers spend a lot of effort
on checking English, style, clarity of explanation, etc.,—but
do they
ever ask themselves whether the reported results are true in the first place?
No, for a simple reason: there is no way to know. They have no choice but to trust the author.
Is this science?
Many
publications, any clarity?
Scientific
results must be verifiable and reproducible. Can you imagine a paper on
physics that says "we made some apparatus, with some electrodes and lights (here
is a photo), put there 5.4321 grams of some blue substance, and it gave 1.2345%
more energy than coal"? What have you learnt from such a paper? Now substitute
apparatus by algorithm and substance by corpus—wouldn't you get a quite decent computational linguistics paper?
Many
publications, any knowledge?
In 1637,
Pierre de Fermat wrote: "I have discovered a truly marvelous proof <...>.
This margin is too narrow to contain it". It took 358 years or intensive
research for a proof of the
theorem in question to become known to anybody else other than
himself (if he really had a correct proof at all). Still many of us find
this tempting formula—"but the space limitations do not allow us to deep into
details"—so saving when we are in hurry to meet the conference deadline. How many years
will it take for other people to get to know what we had in mind when
writing this phrase? Are we then really communicating novel information to the
reader—or are we boasting of our own intelligence, advertizing our group,
getting university promotion points, or, worse, hiding details that the readers
may find flawed? Does such science generate knowledge or scientific spam?
Recently
a colleague, well-reputed high-rank researcher, explained me why he didn't want
to provide an implementation of his algorithm with his paper: "if other groups
see my implementation, then they will be able to improve their results
using my programs, and then our group will have fewer opportunities to publish."
That's perfectly normal reasoning in industry, but is he then doing science?
Isn't the whole purpose of science to make other people be able to improve their
results? Do you think his printed paper was written with an honest effort to make
it complete, understandable, and reproducible, or was it cynically generated
scientific spam that mimics a research report?
Many
publications, any programs?
I've
graduated 50+ students, who have written a lot of programs... but does our group have
access to any of these programs? Each
time a student is about to be graduated, we are too busy with the thesis to think
of the programs, and after that, I have no control over him anymore—so I am
left with a thesis describing his program, but no program. Am I
the only one? Recently a student of mine needed a parser for a specific
language. We've wrote one some 10 years ago, but wait—I've seen hundreds of
recent papers on advances in such parsers. So I spent two days digging in the
Web... a lot of papers, no parsers. Sounds familiar?
Gift
of knowledge or advertizing?
Can
you imagine a mathematical paper saying "a = b, but the proof is commercial property of our company"? Not any intellectual activity is science:
science is producing knowledge for all, not for oneself. The fact that some close-source software
is good can be advertized, but it cannot be object of a scientific publication.
If you have a program, make money of it—but don't call it science unless you
are willing to show others how to write such a program. Advertizing disguised as
scientific publication is misleading, frustrating, and dishonest: it gives a
double reward to the author, money for the program and money from the university
promotion.
One
paper that erroneously (or intentionally) reports too high
results on a task (though looking plausible enough to be publishable) can
completely block any progress in the field.
A
student of mine suggested a new method for some task. She found a
long paper published in a very respectable journal, which reported higher
results—in fact a bit too high to be realistic, so she suspected an error.
She spent months trying to reproduce the algorithm from the paper, and gave
up: the description was not clear enough to write a program following it. When she
asked the author, a very respectable scientist, he did not already
remember the details, and his student who implemented the programs has
graduated long ago. Even the corpus on which the results were reported has been lost
on a broken laptop. Sounds familiar? Finally, she had no way to publish her
results, because there was no way to compare them with previously published
ones and because her figures were not as high as those reported in that paper.
So
the current situation in that field of research is: there is a paper published in a very prestigious
journal that reports very optimistic figures; there is no way to know how one can
achieve this quality in his or her own system; there is no way
to know whether those figures are true; and there is no way to publish new research in the field
because the figures that all new research obtains are lower than those reported in that paper.
If, for whatever reason, those figures were erroneous, then no advance in this field is
ever possible.
One paper killed a whole field of science, in exchange for university promotion points to one person.
Only bad guys do bad
things, but do we good guys need to be verified?
Another
student of mine achieved a very good result with some algorithm. We considered
her PhD graduation guaranteed and wrote a long paper describing the results, to
be sent to a best journal. When the paper was nearly ready, one day she was
examining her code and accidentally saw a subtle error, which she found to be
critical for the results. She was honest enough to throw away the paper and notify the thesis
jury (would every your student do that?); finally she graduated much later with
other results. But imagine if on that sunny day she'd have a date with her
boyfriend instead of examining the code of her old program! Would we kill a
whole field of science?
I will use here examples from my own old work, in order not to offend anybody,—but do you think examples from your publication record could not have appeared here instead?
In
the paper
A
Very Large Database of Collocations and Semantic Links, I reported the
creation of a large lexical resource,
which until now has not been made available to public, even for a fee. What's
the point in reporting something that nobody can use or even see?
In the
paper
Information Retrieval with a Simplified Conceptual Graph-like Representation,
I reported an algorithm. I say there: "we used a simplified
structure, which is basically syntactic structure minimally adapted to semantics
represented in conceptual graphs." What's the point of saying that it was
an adapted structure without explaining what specifically was adapted? These
details could have been seen from the implementation, but it was not provided with the paper.
What is more, I say there that
we developed an English grammar that we used for parsing;
however, the grammar itself is not available—and it is too large to be given
in the paper. Without this exact grammar, our results are not reproducible, as
they do largely depend on the specific grammar we used. Now it's too late to
make the grammar available: the published paper does not make any reference to
where the grammar can be found.
In
the paper
Detecting Inflection Patterns in Natural Language by Minimization of
Morphological Model, I reported an algorithm. However, much later when I was
once more examining the code I found a threshold that was not mentioned in the
paper, which I introduced temporarily and forgot to remove. When I
removed the threshold, the algorithm did not work as expected—the threshold
seems to be essential for the algorithm to work. So, the published algorithm is incomplete, not to say incorrect, and is
not reproducible; however, the readers have no way to know it.
These publications were not made in bad faith; only later I realized that I should not have done that. Badly enough, these papers have been reviewed and accepted at decent conferences—because they do meet the usual standards of our science and the reviewers did not see any problem with them! What is to be changed is our perception of what the usual standards are.
Porter's
An algorithm for
suffix stripping is accompanied by a
page describing the
algorithm itself, so that no doubts are left about how it works.
Umemura
and Church's
Substring Statistics is accompanied by a
sample of code and data,
referenced within the paper as [21].
Our
paper
An Associative Network of Concepts that Enter to Internet Queries, which
reports the creation of a lexical resource, is accompanied by
the dataset
itself, referenced within the paper as [10]. Note that
even though we made a new version of the resource available, the version
reported in the paper is preserved and is clearly marked as such.
In all cases the code or data that accompany the paper is something simple—not a huge system with a fancy user interface—but essential for the precise understanding of the algorithm and/or the possibility to use the information reported in the paper.
Publication
of an algorithm or experimental results must be accompanied by
software and data.
While
a printed conference paper is too short, we have all
One
can object that this will make "papers" many times longer and will greatly
increase the effort on their preparation and review. Yes. Look at a math paper:
90% of it is the proof of what the other 10% says—and, of course, it took the
author 90% of effort to write this "useless" part or it, which does not add any
new ideas but only serves to convince the reader that the ideas communicated in
the other 10% of the text are correct. This is what makes mathematics a science.
Unfortunately, our field has not yet developed this culture.
Scientific
results must be open-source.
It
is not enough to provide an executable or a binary resource with a scientific
paper. Knowledge is human, so only human-readable resources serve the purpose of
science: to advance human knowledge. Of course they must also be
machine-readable, for people to be able to reproduce the results.
One
can object again that this will create copyright problems. I can only repeat:
either do science or do profit; if you are not willing to give your knowledge to the
community, then make money of it, don't make science of it.
I know there are many problems to solve, many objections to discuss, and many bad habits to get rid of. The way is long. Be part of the change!
Comments: A.Gelbukh.