Lab 3: Data Quality Assessments

 


This week for lab we completed an accuracy assessment by determining the completeness of two road networks: Centerlines and TIGER census dataset. Street centerlines are from local government datasets while TIGER, Topologically Integrated Geographic Encoding and Referencing, dataset is freely available in the US as a part of the census. There is no standard for completeness but the method carried out in this lab is from several other published analyses. To help complete the steps of this lab, I referenced methodology used in, "How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets" by Haklay.

First, I used the Clip Tool for each of the two road networks so they were only inside the grid extent. To ensure the length is only calculated the road segments inside of a grid cells instead of the entire segment, I used the intersect Tool to split the road segments at each grid boundary. Within the Attribute Tables of the new clipped TIGER and Centerline layers, I added a new field and used the Calculate Geometry Tool to find the length in kilometers of each segment associated with the Grid IDs. Finally, I used the Summarize Within Tool by the GRID_ID field so that I could use these numbers to complete a numerical summary and ultimately calculate completeness of each road network and compare the quality of the two datasets. 

Shown above is a choropleth map of the results where the percentage differences are placed into 7 categories. Where the TIGER data set is more complete, grid cells are red and where the Centerlines dataset is more complete, the grid cells are blue. Where they are similar and have the lowest percent differences, the grid cells are white. 

Comments