Sign up FAST! Login

What Movie Ratings teach us about Data Quality

What Movie Ratings teach us about Data Quality

popcorn soda

In previous posts on this blog, I have discussed aspects of data quality using travel reviews, the baseball strike zoneChristmas songs, the wind chill factor, and the bell curve.

Grab a big popcorn and a giant soda since this post discusses data quality using movie ratings.

The Following Post has been Approved for All AudiencesIn the United States prior to 1922, every state and many cities had censorship boards that prevented movies from being shown in local theaters on the basis of “immoral” content. What exactly was immoral varied by censorship board.

In 1922, the Motion Picture Association of America (MPAA), representing all the major American movie studios, was formed in part to encourage the movie industry to censor itself. Will Hays, the first MPAA president, helped develop what came to be called the Hays Code, which used a list of criteria to rate movies as either moral or immoral.

For three decades, if a movie failed the Hays Code most movie theaters around the country would not show it. After World War II ended, however, views on movie morality began to change. Frank Sinatra received an Oscar nomination for his role as a heroin addict in the 1955 drama The Man with the Golden Arm. Jack Lemmon received an Oscar nomination for his role as a cross-dressing musician in the 1959 comedy Some Like It Hot. Both movies failed the Hays Code, but were booked by movie theaters based on good reviews and became big box office hits.

Then in 1968, a landmark Supreme Court decision (Ginsberg v. New York) ruled that states could “adjust the definition of obscenity as applied to minors.” Fearing the revival of local censorship boards, the MPAA created a new rating system intended to help parents protect their children from obscene material. Even though the ratings carried no legal authority, parents were recommended to use it as a guide in deciding what movies their children should see.

While a few changes have occurred over the years (most notably adding PG-13 in 1984), these are the same movie ratings we know today: G (General Audiences, All Ages Admitted), PG (Parental Guidance Suggested, Some Material may not be Suitable for Children), PG-13 (Parents Strongly Cautioned, Some Material may be Inappropriate for Children under 13), R (Restricted, Under 17 requires Accompanying Parent or Adult Guardian), and NC-17 (Adults Only, No One 17 and Under Allowed). For more on these ratings and how they are assigned, read this article by Dave Roos.

What Movie Ratings teach us about Data QualityJust like the MPAA learned with the failure of the Hays Code to rate movies as either moral or immoral, data quality can not simply be rated as good or bad. Perspectives about the quality standards for data, like the moral standards for movies, changes over time. For example, consider how big data challenges traditional data quality standards.

Furthermore, good and bad, like moral and immoral, are ambiguous. A specific context is required to help interpret any rating. For the MPAA, the specific context became rating movies based on obscenity from the perspective of parents. For data quality, the specific context is based on fitness for the purpose of use from the perspective of business users.

Adding context, however, does not guarantee everyone will agree with the rating. Debates rage over the rating given to a particular movie. Likewise, many within your organization will disagree with the quality assessment of certain data. So the next time your organization calls a meeting to discuss its data quality standards, you might want to grab a big popcorn and a giant soda since the discussion could be as long and dramatic as a Peter Jackson trilogy.

- See more at:

Stashed in: Big Data!, movies, Big Data

To save this post, select a stash from drop-down menu or type in a new one: