All teams must demonstrate that when making predictions on the test set, they are 1) simulating forecasting faithfully and 2) not using restricted data sources. Even though there is no restriction on how the models are trained, teams should ensure the training procedure does not utilize or leak test set information, as described in more details below.
We reserve the right to reject and disqualify submissions that get high accuracy without legitimately increasing machine forecasting ability (for instance, due to overfitting to the test set; we take excessive numbers of overly-tuned hyperparameters, or a large accuracy drop on fresh forecasting questions as evidence for this). To this end, the organizers have curated a secret test set to inform ranking of top submissions and will actively monitor for rule-breaking.
How can I demonstrate faithful simulation of forecasting?
A complete submission must contain a short writeup explaining how the participants ensured they are simulating forecasting faithfully.
- Describe how the data sources satisfy the rules. The links should be provided, otherwise the data and curation process must be shared with the organizers.
- Describe how the pre-trained models used satisfy the rules.
- Describe the training and prediction methods and how they satisfy the rules.
Example writeup: “We use the FiD architecture with the pretrained weights of T5-v1.1 model and finetune on Autocast questions that were resolved before May 2021. T5-v1.1 was trained on C4, which was uploaded to Hugging Face in March 2021. Aside from the question and its background, we also use BM25 to retrieve additional information from both the C4 and CC-NEWS datasets, using the question as the query. In both training and prediction, we only retrieve news articles with a timestamp before the question cutoff time described in the rules. We assumed Anywhere on Earth timezone for news articles due to a lack of timezone information.”
Teams must open source their methods, code, models, data, and any additional information to reproduce the experimental results upon request by the organizers.
What does simulating forecasting faithfully mean?
Forecasters predicting an event only have access to information before a cutoff time (i.e., when the event actually happened), otherwise there would be information leakage. Each question in the test set has a close time in UTC. Normally, to avoid leakage, only information up to the close time can be used to form a prediction for the question. However, to avoid confusion and inaccuracies, we set the cutoff to be the day before the close date. For example, for a question closing at Mar 23, 2022, 10:19 UTC, the cutoff time would be Mar 22, 2022, 23:59:59 UTC.
There are two common sources of information leakage:
- Leakage through the data: We require all data sources (news articles, academic writings, social media posts, text corpora, web pages, etc.) used at any point to have a clear publish time. Before outputting the prediction to a test question, any exposure to data (training, fine-tuning, validating, retrieval, prompting etc.) published later than the cutoff time leaks future information and violates the rules. One exception is the usage of custom data created after the cutoff time but curated with care to prevent information leakage.
- Leakage through the model: We require all pre-trained models used at any point to have a clear release time or direct proof that the training data does not create leakage. Before making a prediction to a test question, any utilization of models exposed to data published later than the cutoff time leaks future information and violates the rules.
What data sources are considered to be restricted?
These data sources contain near ground truth or expert judgment on specific test set questions, and are therefore disallowed:
- Forecasting sites and similar (Metaculus, GoodJudgment, CSET, Hypermind, PredictIt, etc.). For example, comments, source links and crowd forecasts of test questions cannot be used. (Note: the Autocast dataset was released with the source links and crowd forecasts even for test questions; ONLY QUESTIONS RESOLVED BEFORE May 11th, 2021 CAN BE USED.)
- Human (super)forecasters’ predictions on test questions.
- Insider information related to the outcome of events being predicted.
Evaluation
For true/false and multiple-choice questions, we evaluate models using the Brier score, which is then divided by 2 to normalize between 0% and 100%. For numerical questions, we use L1 distance, bounded between 0% and 100%. We denote these question types as T/F, MCQ, and Numerical, respectively. To evaluate aggregate performance, we use a combined metric (T/F + MCQ + Numerical), which has a lower bound of 0%. A score of 0% indicates perfect prediction on all three question types. For more details, please check out the Autocast paper.
Overview
The dataset may grow in size in the future. Prizes and precise rules may be adjusted correspondingly.
Warm-up Round
Format: Submissions can be made at any time on CodaLab from the start date. Teams will compete on the combined metric defined above. No submissions will be accepted after the end date and the winners are determined by their best submission.
Start date: Sep 14th, 2022
End date: Apr 17th, 2023
Future Rounds
Format: More details will be announced soon.
Terms and Conditions
We cannot give awards to teams with members on US terrorist lists or those subject to sanctions. Sponsor may confirm the legality of sending prize money to winners who are residents of countries outside of the United States. Only authors on awarded papers are winners. All decisions of judges are final. The legality of accepting the prize in his or her country is the responsibility of the winners. All taxes are the responsibility of the winners. Employees of the funding party and contest organizers are not eligible to win prizes. Entrants must be over the age of 18. By entering the contest, entrants agree to the Terms & Conditions. Entrants agree that the funding party shall not be liable to entrants for any type of damages that arise out of or are related to the contest and/or the prizes. By submitting an entry, entrant represents and warrants that, consistent with the terms of the Terms and Conditions: (a) the entry is entrant’s original work; (b) entrant owns any copyright applicable to the entry; (c) the entry does not violate, in whole or in part, any existing copyright, trademark, patent or any other intellectual property right of any other person, organization or entity; (d) entrant has confirmed and is unaware of any contractual obligations entrant has which may be inconsistent with these Terms and Conditions and the rights entrant is required to have in the entry, including but not limited to any prohibitions, obligations or limitations arising from any current or former employment arrangement entrant may have; (e) entrant is not disclosing the confidential, trade secret or proprietary information of any other person or entity, including any obligation entrant may have in connection arising from any current or former employment, without authorization or a license; and (f) entrant has full power and all legal rights to submit an entry in full compliance with these Terms and Conditions.