Total Population

2020 projection of the US population grouped by Zip3, gender, and age range using prior years census data

Data Acquisition

ZIP source data was acquired from This dataset contains 1,622,832 rows, each one listing; the population, age range, gender and zip5. We adiscarded the rows listing the total population by zip5 and gender and summed the total population of zip5. Census source data from 2011 through 2017 was gathered from

Extraction, Transformations & Projections

Extraction, Transformations & Projections

The process was performed using Python, Pandas Libraries, and Scikit-Learn. The source data was extracted from each file and transformed to obtain a table with the following columns; Zip3, Gender, Age range, Population of 2010 up until the Population of 2017..

The populations of each Zip5 (by gender and age range) were summed to obtain the population at the Zip3 level. Linear regression with rectification was applied to the populations 2010-2017 to project the population for 2020 for each row.