Send Close Add comments: (status displays here)
Got it!  This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.  Note: This appears on each machine/browser from which this site is accessed.
Predictive coding: precision and recall
by RS  admin@robinsnyder.com : 1024 x 640


1. Predictive coding: precision and recall
PC (Predictive Coding) is a technique whose goal is to take a collection of possibly relevant documents and automatically determine which documents are relevant and which are not relevant to the task at hand. This task is often a legal proceeding whereby relevant documents need to be determined. Names that are often used in place of the term PC include the following.

2. Precision and recall


In PC terminology, the following terms are used. This is an important area in the field of IR (Information Retrieval).

3. Precision and recall
In simple terms, PC attempts to divide the document search space into to parts:

4. Not recalled
The "not recalled" documents are deemed "not relevant". A good PC system should put on the order of 98% of documents into this category (assuming there is a large category of documents at the start of the process). The false-positives are that subset of the recalled documents that are actually relevant but not identified as so. The goal of PC is to make this subset as small and unimportant as possible.

5. Recalled
The "recalled" documents are deemed "relevant". A good PC system should put on the order of 2% of documents into this category. These documents still need to be reviewed by human reviewers to determine the actual relevancy.

6. PC system
A PC system attempts, for example, to require human review of only 2% of the total documents rather than 100% of the available documents.

In a legal case, there can be a lot of documents and reviewing them can be very time-consuming and thus very expensive.

7. Approaches
The two traditional approaches to PC, which are not mutually exclusive, are the following. In practice, aspects of both approaches can be used in a PC system.

8. Rules-based systems
A rule-based system has some aspect of "knowing" what the documents mean embedded into the rules. In machine learning, one can compare documents without an embedded aspect of "knowing" what the documents mean. Since machine learning techniques do not "know" what the documents mean, machine learning techniques require a learning set with which to start the comparisons. This is called seeding the system with known relevant documents. An approach is needed in which to obtain sufficient relevant material in order to create the base for creating a good training sample. One way is to use standard and suitable search query and human review in order to obtain same known relevant documents.

9. Machine learning
There are two types of machine learning (i.e., bottom-up, statistical) systems. In practice, aspects of both approaches can be used in a PC system.

10. Decision Support System
Since a complex system such as PC cannot envision every possibility, an actual system would incorporate aspects of a DSS (Decision Support System) which allows a human to intervene at appropriate points and, given feedback and diagnostics by the program, make "decisions" for the program to continue.

11. End of page

by RS  admin@robinsnyder.com : 1024 x 640