CICLing Verifiability, reproducibility, and working description policy
Bottom line
In a restaurant you are offered not only a menu but also food. Same here. Your paper is the menu that describes the food you cooked. Code and data are the food.
In mathematics, a paper consists mainly of proofs; a paper without proofs is a nonsense. Same here. Description of your work is formulation of a theorem ("our method gives 3% better results"); code and data are proof of the theorem.
What we should publish (= make public) in computer science are algorithms (code) and data (resources), not their descriptions (papers). In an ideal world, a paper would be only an attachment to the code/data. Not vice versa as we do now... or do we?
What to submit and why
Starting from 2011, CICLing implements a policy of giving preference to papers with verifiable and reproducible results:
If the authors claim to have obtained a result, we encourage them to make all the input data necessary to verify and reproduce the result available to the community.
If the authors claim to advance human knowledge by introducing an algorithm, we encourage them to make the algorithm itself -- and not only its (usually vague and incomplete) description -- available to the public.
If the authors claim to have compiled a lexical resource, we encourage them to make the resource itself -- and not only its description -- available to the public.
Code: We encourage the authors to submit, together with the paper, a program (open source), as simple as possible, that follows the described algorithm and generates the results presented in the paper. No need in any sophisticated interface or performance improvements -- only easy to understand source code that generates the claimed results. Think of it as a proof of a theorem: the result reported in your paper is a theorem, and the source code generating this result is its proof. Our sole purpose is to exactly reproduce your results and to be sure that it is reproduced with exactly the same method as you describe. Minimalistic approach is the best: just implement your algorithm in a way simple to read and understand, nothing else. Please extensively comment your code. Naturally, input data are to be presented together with the code; when this is impossible, you can provide instructions on where the data can be obtained (e.g.: WordNet, Google pentagrams, etc. need not to be included with your code, but we do need instructions on how we can obtain exactly the version you used; using standard software or corpora is highly preferable over home-made ones when possible).
Resource: We encourage the authors to submit, together with the paper, a program (open source), as simple as possible, that follows the described algorithm and generates the results presented in the paper. No need in any sophisticated interface or performance improvements -- only easy to understand source code that generates the claimed results. Think of it as a proof of a theorem: the result reported in your paper is a theorem, and the source code generating this result is its proof.
We will give a special best verifiability, reproducibility, and working description award to the authors of the software that in the best way fulfills the above goals (that is: the simplest and clearest code that proves the claims of the paper and allows one to exactly reproduce its results).
Submission of such code is not a requirement. For example, the nature of the paper may not require any additional data or code, or your experimental setting does not allow it, even after you have done all reasonable effort to make it possible. However, if the reviewers judge that the paper does require and does allow submission of the code and data to be verifiable and reproducible, then preference will be given to papers accompanied by the code. We do understand that you may not have had time to prepare the code. We will use common sense in applying this policy. If for any reason you cannot submit the code, go ahead and submit your paper normally.
What we ask for is not a demo or tool based on your paper, but a form of proof and working description of the algorithm in addition to the verbal description given in your paper. An approximation of the idea is the code submitted with Church & Umemura's paper to be permanently hosted at CICLing servers, and cited in the paper (see last line). You see, we don't mean anything complicated. You can also show demo programs or tools based on your method, either as part of your talk or at the demo session (and we will be happy to host on our servers such software that complements your paper), but this is not required. In contrast, we do believe that a publicly published scientific paper must be accompanied by a minimal working description of the algorithm, open-source and available to the community. (In fact even the other way round: the code ought to be accompanied by a paper.)
We do not ask for impossible: if you present a large system, especially commercially distributed or a property of your company, then we do not expect you to provide its source code. Our point is that when the software and data can be provided, it should be provided.
We do not yet have specific rules: we hope to elaborate the rules basing on our experience, so please use common sense. See the problems this policy is to address, as well as the list of software reviewing committee and instructions for the reviewers.
CICLing will keep the right, though no obligation, to host your files on its servers. Upon acceptance of your paper, we will give you a permanent link to the hosted data; please indicate this link in the camera-ready version of your paper. Please accompany your code with a suitable license that would allow its free distribution, free study (and reverse-engineering if needed) by the public, and free use of the knowledge obtained from such a study. For the future editions of CICLing we plan to elaborate a special CICLing license for academic code, documentation, and data; if you have any ideas or suggestions on such a document, please let us know, any guidance is highly appreciated.
How to submit code / data
We recommend that you submit your code as a ZIP file attachment (in EasyChair, use the Attachment field on the paper submission page; if you didn't have time, try sending us the ZIP file by email later) containing in its root the following directories (you may choose another structure if it makes more sense):
- File README.*, in appropriate format, such as PDF or TXT, with complete and clear description of the contents and its use, including installation, compilation, running, analyzing the results, and matching the results to the claims of the paper. For example, if Table 1 claims that your algorithm gives 74% on your corpus, describe where this figure is in the output of your program. Please specify versions of common software (such as Perl) on which you tested your program. Instead of including all material, the README file can point to other files, such as INSTALL.PDF.
- File LICENSE.*, in an appropriate format, specifying the license terms for your software, compatible with free distribution, studying, and reverse-engineering of your code. Anyone should be permitted to modify or use your software for any purpose. Wikipedia's approach to licensing is a good example of what science (i.e., advance of human knowledge) is meant to be. It would be good if this permission included the data (and not only software), but this is not always feasible; use common sense. Please specify who the authors are and how to contact them (to reduce spam I'd not recommend including emails; consider pointing to their webpages). The authors retain all author rights, even if they grant non-exclusive distribution permission to CICLing. Note: for the review stage you can omit the authors' names and contacts, but please don't forget to contact us for an update upon acceptance: it is very important that the user know it's your work.
- File LICENSE_CICLING.*, in an appropriate graphical format, should be a scanned image of a written document signed by at least one of the authors, which explicitly authorizes CICLing to host all the other files on its servers for unlimited time and to provide public access to them. (If you prefer so, you can provide this file upon acceptance of the paper.)
- Directory SOURCE with the source code of a program that implements your algorithm and produces the results you report. If possible use well-known major programming languages.
- Directory BIN with compiled executables for a major OS (Windows, Linux, or Mac).
- Directory UTILITIES with auxiliary programs that are not part of your algorithm. For example, if you use a grep program, you can put it here. Less common utilities must be included, along with their respective documentation.
- Directory INPUT with the input data necessary to produce the results, including corpora, dictionaries, grammars, or whatever may be needed.
- Directory OUTPUT with exactly the same output data that is expected to be produced if one follows your instructions. We will know that the program installed and ran correctly if by running your program we obtain exactly the same results.
- File RUN.*, such as RUN.BAT, that can be called without any parameters and the data from INPUT into the result in OUTPUT without any user intervention. The instructions should specify what file is to be called, possibly after some configuration.
- Other directories as appropriate, with meaningful names.
Please keep things as simple as possible (though not simpler), and the installation and use instructions as clear as possible. Usually this implies detailed and specific instructions, as well as completely automatic script that performs all necessary steps without user intervention. In the text of your paper, make sure that the software reviewers will easily locate the description of the input data and the obtained results; section titles such as Main Algorithm, Experimental Methodology, and Experimental Results could be helpful. If the reviewers fail to quickly and easily install your program, run it, and interpret the results, they will probably give up and lower your score.
The archive should be as self-contained as possible: it is a good idea to include as much as possible -- compilers, interpreters, utilities used, etc. If possible, to include a complete distribution of Perl or WordNet is better than to rely on that we will somehow find the version you used. Remember that science is done for eternity: your results should be reproducible in twenty years, when it may be impossible to find a specific version of a utility or compiler you used. If the resulting file is too large so that the system does not allow uploading it, please submit your paper alone and contact us for the attachment.
No double-blind policy for software: For the time being, we encourage the software to be anonymous but we do not require this. We understand that in some cases it can be impossible or too labor-consuming. If you really cannot make your software anonymous, then leave it as is.
More information: see FAQ.