Self-Service
Crowdsourcing at your fingertips
Self-Service
"At thumbtack.com we have tons of services that we needed to categorize (handyman, photographer, babysitter, etc). CrowdFlower made it easy to build a very accurate training set for machine learning. Now we're able to categorize the whole universe of local services without paying for each task. CrowdFlower has allowed us to scale infinitely for one small up-front cost."
"We needed to quickly and accurately categorize geotagged Flickr photos into garbage/not-garbage, and later into garbage subtypes such as plastic/glass/paper/etc for an interactive map. Using CrowdFlower, we were able to process 43,000 photos for a fraction of what we anticipated to spend! Everything worked so well, that I plan on using CrowdFlower again next week for a SEO/SEM/semantics related task in another organization."
"Bizo enjoys using CrowdFlower! We’ve found everything from the user interface, to their API, easy & intuitive to use. In addition, CrowdFlower’s customer support is second to none - all of our questions and concerns were addressed immediately."
"My experience using CrowdFlower is a real pleasure, empowering me to share insightful information about something I care passionately about. After considering a few other options, I decidedly chose CrowdFlower for its ease of use, ongoing flexibility, and efficiency. Read more about my experience here!"
CrowdFlower's Self-Service platform allows you to easily design and manage your crowdsourcing projects while ensuring a speedy completion time and quality results.
What is it good for?
The Self-Service platform is best suited for data-centric projects that can be broken into many simple, repetitive tasks. This will allow us to distribute the job to many contributors simultaneously. To ensure the project is yielding quality responses, we recommend that the contributor form asks at least one objective question, which cam be used to test contributors as the job completes. However, a wide variety of projects are run on Self-Service, including art projects, surveys, research and much more.
Sample project types: Annotation, Categorization, Content Moderation, Find/Verify, Judge/Relevance, Sentiment Analysis (more info in our Best Practices)
What is it not good for?
Self-Service is not a great fit for highly complex projects that require a lot of experience (e.g., programming, article writing, design work, etc). Spam and traffic building projects (jobs requiring contributors to vote, "like", back-link, etc) are not permitted on the platform due to their cheating nature and the negative response we see from our contributors.
Some project types that typically do not fit well: Content Generation (See our Enterprise offering), Translation, Transcription
How it works:
There are 5 basic steps to running a job on Self-Service:
- Upload your data — Upload the data that you need contributors to process. You may also create a job with no data.
- Create the contributor form and instructions — Use our form editors to create the interface that contributors will use to complete the job.
- Create Gold Standard Data — Complete a small number of tasks yourself, or upload a spreadsheet containing data with the correct responses. We'll use these units to ensure you are obtaining high-quality responses.
- Order judgments — Set the price and post the job to the workforce.
- Retrieve your results — Results are automatically compiled, stored and aggregated upon job completion.
For more information, please see our tutorial and FAQ.
Some of the features available to Self-Service:
- Gold Standard Data — Ensure accurate results without having to manually reject individual contributors and judgments as the job completes. Visit our Gold documentation for more details.
- Form Editors — We offer editors both for the novice and the expert. Use our Graphical Editor to quickly and easily add questions to your form, or use our CML editor to customize your form with complex logic, CSS, Javascript and more.
- Contributor Channels — Increase the speed of your job by running on multiple contributor channels simultaneously.
- Aggregation/reporting — Download your results at any point as the job completes. We offer an aggregated report, which compiles all judgments and provides the most trusted response, as well as a confidence rating, as well as a full report, which contains every judgment submitted in your job.
- API — Completely automate your process with our fully functional API. For more info, please see our API documentation
- Metrics—Access real time metrics on your job and contributors as the job completes.
- Testing interface—Use the CrowdFlower Internal Interface to test your job and submit judgments in the job. and much more!
Contact:
Have an inquiry directly related to Self-Service? Email selfservice@crowdflower.com
Self-Service Tutorial
CrowdFlower enables customers to process large amounts of data in a short period of time by breaking a repetitive job into many small pieces and distributing them to many contributors simultaneously. This tutorial will show you how to create and run a job on the CrowdFlower platform. There are five basic steps to running jobs:
- Upload your data — Upload the data that you need contributors to process.
- Create the contributor form and instructions — Use our form editors to create the interface that contributors will use to complete the job.
- Create Gold Standard Data — Complete a small number of tasks yourself, or upload a spreadsheet containing data with the correct responses. We'll use these units to ensure you are obtaining high-quality responses.
- Run the Job — Set the price and post the job to the workforce.
- Retrieve your results — Results are automatically compiled, stored and aggregated upon job completion.
In this example, we will categorize a list of 110 companies.
Uploading Data
Although you can import data in a number of ways, the most common method is to upload a spreadsheet.1 We recommend uploading a UTF-8 encoded CSV that includes only the information that will help contributors complete the task. If your job contains no data (e.g., a survey), you can select "Create a Job With No Data." In this example, we'll upload a CSV file containing the name and URL of each business, each on a separate row. Once uploaded, each row in the spreadsheet will become an individual task. Make sure that the first row of your spreadsheet contains column headers.
1 You may also pull a data feed (e.g., a Twitter feed) or upload JSON via the API.
Creating the Contributor Form
The contributor form houses the uploaded data, displaying information dynamically for each unit a contributor sees. This is also the location where contributors view the instructions and submit responses. In this example, we've decided to display the company name and URL, as well as a set of radio buttons to capture contributor input.

There are two editors available for creating the contributor form. The graphical editor is best suited for simple jobs that do not contain complex features such as logic, custom formatting, JavaScript, etc. The example above is well suited for the graphical editor. Our CML editor allows you to use code to implement special contingencies (logic, HTML, Javascript, CSS, etc.) in your form. We recommend you read our CML documentation before using the CML editor.
Creating Gold Standard Data
After you upload the data and create the contributor form, you should then utilize CrowdFlower's automated quality control technology. You can greatly improve accuracy in jobs by creating a set of hidden tests that will be randomly distributed throughout the regular units that a contributor completes. We refer to these tests as "Gold Standard Data." By saving the correct answers to a small set of units prior to running a job, we can record the quality of a contributor's performance and reject a contributor once his or her accuracy drops below a defined threshold. You can create Gold units individually with our Gold-digging tool (below), or in bulk via a spreadsheet.
You are permitted to save more than one response for each field in the contributor form. When creating Gold, you should be 100% certain that you have accounted for all correct responses. You do not want to punish good contributors for submitting acceptable responses. A Gold reason will be displayed when a contributor misses a Gold unit. Gold reasons help to train contributors. For more information on Gold, please visit our Gold documentation. and our FAQ.
Running the Job
After you have designed the contributor form and created Gold, you will be ready to run the job. You can do so from the Actions menu, available on any page.

This directs you to an interface that allows you to calibrate price and choose the contributor channels you wish to run on. There are no minimums when setting pay, but note that contributors will not agree to work on your job if the pay is too low. Running jobs on CrowdFlower is often an iterative process. You may need to adjust pay several times when starting a project.
After you have calibrated the price, you must choose the contributor channels you wish to run your job on. Then you will be ready to start collecting judgments.
Obtaining Judgments
Judgments usually start trickling in within the first couple minutes of run time. The Overview page acts as the dashboard. This is the best location to check the status of a running job.
Note that our job has already started to collect trusted and untrusted judgments. By default, a contributor must see a minimum of four Gold units before the system evaluates his or her accuracy. Contributors who have seen four Gold units and whose accuracy falls below 70% will be notified that they can no longer participate in a job. Their judgments will then be converted to untrusted. Only trusted judgments (from contributors with an accuracy of 70% or higher) will be included in the job's results.
The contributor page provides information on each contributor in the job.
It appears that our untrusted contributor (in red) was automatically stopped after seeing four Gold units. Note that you have the ability to forgive a missed response to a Gold unit. This will immediately raise the contributor's accuracy. If the contributor entered a reasonable response and was flagged for it, be sure to save the response in the Gold so other contributors will not be punished as well.
Retrieving and Interpreting the Results
Reports will automatically be generated upon job completion. You can generate and download your reports on the "Reports" page.
There are three reports to choose from:
- Aggregated CSV: Includes one result for every Unit. Since multiple contributors answer each unit, the aggregated CSV aggregates contributors' judgments based on individual contributors' trust ratings. This CSV is what you are likely to interact with the most.
- Full CSV: Includes every judgment made by every contributor. Contains (among other things) contributor IDs, trust rating, labor channel, and (of course) his or her judgments. You'll use this CSV to dive deeply into the data to understand patterns (e.g., maybe all contributors from a specific region are providing poor responses, but you wouldn't be able to tell that from the Aggregated CSV).
- JSON: JavaScript Object Notation
A sample aggregated CSV from our finished job:

The results will contain the response that the system has determined to be correct, based on contributor agreement and contributor accuracy. A confidence score is included with every field in the form. This numerical value indicates how confident we feel that the result is correct.
Within an hour we have obtained an accurate list of categorized companies. The use of Gold Standard Data, as well as a thoughtfully designed contributor interface, has allowed us to obtain quality results without having to scrutinize every contributor's responses. This tutorial highlighted only the basic features provided by CrowdFlower. For more information on the features available to you, please read the CrowdFlower documentation. If you don't have an account, you can Sign up now.
Technical FAQ
What is Gold Standard Data?
Gold Standard Data is a set of hidden tests that are randomly distributed through the tasks that a contributor completes. By saving the correct answers to a small set of Units prior to running a job, we track the quality of a contributor's performance and reject a contributor once his or her accuracy drops below a defined threshold. When no Gold Units are inserted, the quality of work plummets. You can create Gold by providing answers to your questions in the Gold Digging Interface. These answers are recorded and randomly distributed throughout the job. For a detailed discussion on Gold, please visit the Gold Documentation.
Things to know:
- Contributors are notified when they miss a Gold Unit. You can include information about the Gold Unit to help contributors better understand what is expected of them. Use Gold Reasons to train contributors to meet your standards.
- Make sure that your Gold isn't too hard in comparison to the majority of questions in your job.
How much Gold should I create?
As a rule of thumb, we suggest creating Gold from 5-10% of your dataset. Your contributors should not see a Gold Unit enough times to be able to recognize it. But if you have 10,000 Units, don't feel compelled to dig 1,000 Gold Units. As long as you are confident that contributors will not be able to pick up on patterns (ratio of responses, remembering specific Units, etc.), your job should run smoothly.
You can also utilize the Max. Work Per Contributor feature (found in the Advanced Options under the Data tab) to restrict the amount of judgments contributors can submit. Note that this can hinder throughput.
Can I make Gold via a spreadsheet?
Yes. The process is covered in detail in Learning Resources.
Can you provide some tips on creating a job?
CrowdFlower is compatible with a wide array of job types, each with a different set of goals and processes. Before you start the design process, it may help to answer the following questions:
- What problem are you trying to solve?
- What is an appropriate solution to this problem (what data do you want)?
- What questions do you need to ask to obtain the data?
- How do you want to ask these questions (i.e., what language will you use), and what type of form element (e.g., pulldown, text field, etc.) will you ask it with?
- In what order do you want to ask these questions?
- How might these questions behave with each other dynamically (does this form require logic or contingencies)?
- Have you provided enough background information?
- Do your instructions make sense?
- Do your questions beg the specific answers that you want?
Drawing this out on paper as a table, decision tree, or whatever suits your needs will be a great boon to your quest. Planning ahead will help prevent you from creating an overly long questionnaire. Short, easy jobs are more common, while longer forms tend to become more expensive and reduce accuracy. If you need to have a lot of questions answered, consider breaking your job into separate jobs.
How do I upload data?
The easiest way to add data to a job is to upload a UTF-8 encoded CSV. Sometimes other formats (like .xls, .xlsx, .ods, etc.) will work, but the best bet for pain-free uploading is the UTF-8 CSV. If you're getting errors when uploading a CSV, the most common reason is encoding. (Excel doesn't always have the best interface for this, so if you're getting errors when you upload your CSV, try downloading OpenOffice, which gives you a lot more control with encoding.) Go to "Save As," change the file type to CSV, and then select Unicode (UTF-8) in the Character Set field. Make sure that the first row of your data file is a header row with appropriate header names. Aim for concise and descriptive headers. Do not include any special characters. (Letters and numbers only please.) You may also copy an existing job with all/some of your Units. This is beneficial if you are running a job similar to a pre-existing one and need to make only a few edits (as opposed to creating a completely new design and form).
You can also copy a job with pre-existing Gold. This allows you to add new data and collect another batch of judgments within the exact same form. Keep in mind that you can also add more data to an existing job. As long as all of your headers match, the process is exactly the same as adding your initial set. In any case, make sure to count the number of Units processed after you upload data to ensure that all of your new Units have uploaded correctly. It is not completely unheard of for Units to get lost in transit.
How much will my job cost?
With Self-Service, you determine the cost of a job. After you've designed the job and it's ready to receive judgments, you will calibrate the price. After you have decided what price is appropriate to fit your needs, CrowdFlower will take a markup of 33%. The quote you are provided prior to running the job will include this markup. There are no monthly fees or minimums.
How can I test my job? Can I access my job as a contributor?
You have access to the CrowdFlower Internal Interface, which allows you and your team to access your job and submit responses as contributors. You will find the Internal Interface in the list of available contributor channels when ordering judgments. After running the job on the channel, a link will appear on the Overview page that will provide access to the contributor interface. Gold will be present in the job, and your responses will be saved and made available in the results. If you are logged in to CrowdFlower, your username will act as the contributor id. Otherwise you will be prompted to enter an email address. The Internal Interface is free of charge.
How do I display my data on the contributor form? Can I display data in a CML element?
There are two ways to apply your data to a contributor form. After you've uploaded your data, select the Edit tab.
- In the Graphical Editor, select the middle formatting box containing the menu bar. A list of your data fields will appear on the left side of the screen. Simply select which fields you would like to insert, and your data will display dynamically in the job.
- In the CML editor, the 'insert field' drop-down menu will allow you to select from a list of available data fields. Once again, select which fields you would like to insert, and your data will display dynamically in the job.
You can display data in a CML element by entering the {field_name} in the 'label' attribute of a parent or child tag.
E.g.,
How do I retrieve my results after the job has finished?
On the overview page, you will find a drop-down menu displaying the text "Finished" (or "Paused," "Cancelled," or "Not Ordered" if the job has not completed). When scrolling over to this menu, you will see three options in the "Download" section.
- Aggregated CSV: Includes one result for every Unit. Since multiple contributors answer each Unit, the Aggregated CSV aggregates contributors' judgments based on individual contributors' trust ratings. This CSV is what you are likely to interact with the most.
- Full CSV: Includes every judgment made by every contributor. Contains (among other things) contributor IDs, trust rating, city/country, labor channel, and (of course) his or her judgments. You'll use this CSV to dive deeply into the data to understand patterns (e.g., maybe all contributors from a specific region are providing poor responses, but you wouldn't be able to tell that from the Aggregated CSV).
- JSON: (JavaScript Object Notation)
How does the job appear on Amazon Mechanical Turk and other sites? Does our company name show up as the requester?
All user information is kept confidential. Jobs that are uploaded to our channels are displayed only as CrowdFlower jobs; contributors cannot see information about the requester.
How is contributor trust calculated?
Contributor trust is calculated based on a contributor's performance on Gold questions within a specific job.
In the aggregate results, how is the confidence level calculated?
Confidence is determined by combining the accuracy of contributors with the agreement of the responses on a specific Unit.
Why am I unable to edit projects that have already been ordered?
Editing a running job is forbidden. This ensures that your results remain consistent. Responses made prior to the change will not be automatically adjusted. We recommend that all problematic jobs be cancelled and copied to a new job, then edited. When you cancel a running job, your account will be automatically refunded with any unused credits left in the job.
Can I set the number of Gold Units a contributor sees?
Yes. Read about Assignments (below) to learn how.
How is Gold distributed throughout the job?
Jobs are broken up into Source Units, Gold Units, and Assignments.
- Source Units are regular data Units that will gather judgments from contributors. It may be helpful to consider these as one row of data in your spreadsheet. If your job were asking contributors to choose colors for dresses, and you needed 2,000 dresses to be judged, a Source Unit would be considered one dress.
- Gold Units are Source Units that have been converted to Gold by saving the pre-defined answers to questions within the Unit via the Gold Digging Interface. These questions will serve as tests in the running job. When a contributor's answer differs from what is set as the Gold response, the contributor will be flagged as incorrect. You can learn more about Gold here: http://crowdflower.com/docs/gold.
- An Assignment is the total number of Units that a contributor sees per page. When you are ready to post a job to a contributor channel, you will be asked to define the number of judgments you would like to gather per Unit, as well as how many Units contributors will see per assignment. There will be ONE Gold Unit randomly distributed per assignment. This means that you have the power to control how much Gold the contributors will see. If you would like contributors to see 1 Gold Unit for every 3 Source Units, you would set Units Per Assignment to 4. Contributors will then complete four Units, submit the results, and then be notified if they have missed the Gold Unit in the Assignment.
I noticed that the Gold Units I have set to "Super Easy" are being seen more than my other Gold Units. Are Gold Units sorted by difficulty?
Yes, when setting the difficulty of Gold Units, easier Units will be seen first. This serves as a bit of a training method, allowing the contributor to be familiar with the task before seeing more difficult questions. You can set all Gold Units to the same difficulty level to ensure Gold Units will be seen randomly. Difficulty will automatically change slightly when a Gold Unit receives contentions from contributors. This is to ensure that jobs will run as smoothly as possible and that contributors do not see poorly created Gold Units.
At what point does CrowdFlower decide to drop contributors from working on a job?
When a contributor's accuracy drops below the accuracy threshold (the default is 70%), his work will then be converted from Trusted to Untrusted. This will exclude their judgments from your results. A contributor who has responded to 4 Gold Units will be automatically removed from a job if her accuracy drops below the set accuracy threshold.
I set "Judgment per Unit" to 3, however some Units received 4 Trusted Judgments. Why is this?
Before seeing 4 Gold Units, contributors will sometimes move back and forth from Trusted to Untrusted based on their accuracy. When a contributor is considered Untrusted, the system does not include his judgments in the results and adjusts accordingly to meet the required number of judgments. When said contributor moves back into Trusted, our platform then adds an extra judgment into the spreadsheet.
Can I set restrictions on contributors?
By selecting the Advanced Options button under the Edit tab, you will see that you have a few options at your disposal. Jobs can be set to admit only those contributors from a specific country. On the other side, you can choose to exclude specific countries. You may also select how much of your job each contributor can see by setting the Max. Work Per Contributor feature.
Learning Resources
This page is intended to provide learning resources for the most common job types and topics we see on CrowdFlower's Self-Service platform. Here you can learn basic strategies that have worked well in the past and utilize the many features and tools available to you on CrowdFlower. Your contributions are welcome. Please contact us at selfservice@crowdflower.com if you have something to add.
Best Practices By Job Type
How to Read Your Results
How to Create Gold With a Spreadsheet
Best Practices by Job Type
How to Read Your Results
Aggregated Results
The Aggregated Results CSV shows the dominant response for each form element (question) in your job, as well as all source data presented to contributors or included in the original unit of data. Note that the Column Headers for contributor responses are pulled from the name (or label) attribute of the form element in question. Each aggregated response will have an associated confidence value, which measures the agreement among contributors on a scale from zero to 1. For example, if all contributors answer “Yes” to a given question, that corresponds with a confidence value of 1. Also note that confidence values are weighted by each contributor’s individual Trust Score, which reflects that contributor’s accuracy on Gold units.
The following table shows common headers in an Aggregated Results CSV and how to interpret them.
| Column Header | Description |
|---|---|
| _unit_id | Unique identifier in the CrowdFlower system for each unit of data. |
| _golden | TRUE if the unit in question is a Gold Standard unit. |
| _trusted_judgments | Number of trusted judgments collected for this unit. Note that because we send out units to many contributors at once, you may notice that some units have more trusted judgments than you ordered. |
| {{field_name}} | Aggregate contributor response fields, where each column header is taken from the name attribute of the relevant form element. This will show the dominant response (result) for each form element from trusted contributors. |
| {{field_name}}:confidence | Measures agreement among contributors for the dominant response. Note that this score is weighted by the individual Trust Score of each contributor. |
| {{field_name}}:confidence_summary | Shows each unique trusted contributor response and the associated confidence value for form elements that are not aggregated. |
| {{field_name}}_gold | Gold Standard (correct) response for this unit, if the unit has been defined as Gold (i.e., if “_golden” equals TRUE). |
| {{field_name}}_gold_reason | Message a contributor sees after answering a specific Gold unit incorrectly. |
Full Results
The Full Results CSV shows each individual trusted judgment collected for your job, as well as all source and Gold Standard data included in the original unit.
Each judgment will also contain the Trust Rating for the contributor that provided this judgment. The Trust Rating reflects a contributor’s overall accuracy on Gold Standard data in this job.
The following table shows common headers and how to interpret them.
| Column Header | Description |
|---|---|
| _unit_id | Unique identifier in the CrowdFlower system for the unit |
| _created_at | Date and time that the judgment on the unit was submitted by the contributor |
| _golden | Indicates whether a judgment was on a Gold Standard unit |
| _id | Unique identifier in the CrowdFlower system for the judgment |
| _missed | TRUE indicates an incorrect judgment made on a Gold unit. FALSE indicates an incorrect judgment made on a Gold unit that was later forgiven |
| _started_at | Date and time that the judgment on the unit was initiated by the contributor |
| _tainted | TRUE indicates that the judgment is untrusted. Note that untrusted judgments are not included in the results by default. |
| _channel | Labor channel of the contributor who submitted the judgment |
| _trust | Trust Score of the contributor who submitted the judgment, reflecting overall accuracy on Gold Standard data in this job |
| _worker_id | Unique identifier in the CrowdFlower system for the contributor providing the judgment |
| _country, _region, _city | Geographic location of the contributor supplying the judgment, based on IP address |
| {{field_name}} | Contributor response fields, where each column header is taken from the name attribute of the relevant form element. This will show the response for each form element |
| {{field_name}}_worker_input | For fields that have certain validators, this column will show the original contributor response before validation |
| {{field_name}}_gold | Gold Standard (correct) response for this unit, if the unit has been defined as Gold (i.e., if “_golden” equals TRUE). |
| {{field_name}}_reason_gold | Message a contributor sees after answering a specific Gold unit incorrectly. |
How to Create Gold With a Spreadsheet
It's relatively easy to create Gold with a spreadsheet. The required steps are explained below.
1. Format your data

As seen above, your data should contain two columns for every field in your form that will contain gold values. The two gold columns should be formatted as followed:
"question_name" is the field's label with spaces converted to underscores and uppercase letters converted to lowercase — e.g., if your field's label is "Enter some information.", it should be converted to "enter_some_information" in your data. All punctuation should be excluded from the header. If using the CML Editor, the "name" attribute can be used as well.
"question_name_gold" should contain the correct answers for the field. If there is more than one correct answer for the field, the answers should be delimited with newline characters.
"question_name_gold_reason" should contain gold reasons for the field - optional (but highly recommended) explanations that will be shown to contributors who wrongly answer the field.
A column with the header "_golden" is also required, which will be used to change a unit's state to Golden. A unit should contain the value "TRUE" in this column if it is gold. All non-gold units should be left blank.
2. Upload the file and flag your gold data
After the file has been uploaded, select "Convert Uploaded Gold" to set the state of all units with "TRUE" specified in the "_golden" column to gold. Use this to flag your gold units instantly when uploading your data. In the example above, three gold units would be created.
3. Link your Gold data to your form
Using the Graphical Editor:
Select "Link Uploaded Gold Data" to link the fields in your form to your uploaded data. Remember that your label must match the header in your data (excluding case and underscores) for them to link properly. E.g., "Enter some information" must be "enter_some_information_gold" in your data. Fields that have been linked properly will be highlighted gold in the Graphical Editor. An error message will be displayed if nothing is linked.
Using the CML editor:
You are permitted to set the "name" OR "label" attributes when linking your gold data, but note that "name" takes precedence if both attributes contain values. The "gold" attribute will need to be set to "true" (gold="true") as well for each field containing gold information.
E.g., <cml:text label="Please enter some information." name="enter_some_information" gold="true" />
Gold reasons will be automatically linked to the field.