|N||Email address||Description||Link to the data||Instructions|
|firstname.lastname@example.org||The dataset provides sales records of a certain good for different countries.|
It provides details on info like the order date, unit price, and profit of
that specific item.
|http://eforexcel.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/||There are files of different sizes depending on the amount of sales records you|
want to have on each csv file. Click on the size you want to download it. Some
parts of data are blurred out in the ship date and order date.
|email@example.com||The data is about the PISA assessment and can display the overall scores of|
different countries in categories like math, reading, and science.
|The website, https://nces.ed.gov/surveys/international/ide/, also allows the|
user to choose certain variables or criteria that they want to see about the
data and get a report containing all the data that they want. For example, for
my particular set of data I chose to display all years, but you can also select
only one year to display.
|firstname.lastname@example.org||esports league statistics||https://assets.blz-contentstack.com/v3/assets/blt321317473c90505c/blt85c849e5a9c818e1/5e9d|
|Organized by stage of competition, team that won/lost each game, map played,|
score per game, overall match winner
|email@example.com||Connecticut's Accidental Drug Related Deaths from 2012-2016.||https://catalog.data.gov/dataset/accidental-drug-related-deaths-january-2012-sept-2015||This dataset provides a lot of information that could be analyzed, from age and|
sex of the victims to the drug used.
|firstname.lastname@example.org||The data details defense exports made by the US government to various countries|
in terms of monetary value. I found this interesting due to the whole variety
of countries that I did not expect to be on this list such as Aruba and the
|From the link, scroll down to download the pdf file.|
|email@example.com||Number of dog names in Anchorage Municipality||https://catalog.data.gov/dataset/dog-names||Download from the link (there are CSV and other forms available).|
|firstname.lastname@example.org||names of babies||https://catalog.data.gov/dataset/most-popular-baby-names-by-sex-and-mothers-ethnic-group-n|
|click download on the top right corner to download.|
|email@example.com||This data set described the salaries of different occupations in New York State.||https://catalog.data.gov/dataset/occupational-employment-statistics||Simple click the download button next to the CSV file or any other type of file|
you would like. This data set describes the mean wage, median wage, and entry
wage of different occupations in different parts of NY along with their
standard occupational code.
|firstname.lastname@example.org||Contains graduation outcomes for the state of New York from 2005 to 2015||https://catalog.data.gov/dataset/regents-exam-results||Can be found on data.gov. Also includes file formats for RDF, JSON, and XML|
|email@example.com||Assessment of how properties in NY pay taxes.||https://chriswhong.com/open-data/liberating-data-from-nyc-property-tax-bills/||This is a very big file, with 1048576 rows, and there was an error message that|
it was too big so I assume there are likely more rows.
|firstname.lastname@example.org||connecticut voters||https://connvoters.com/download.html||here's michigan: https://michiganvoters.info/download.html
and here's every|
state's rules on voter registration lists:
-registration-lists.aspx get the rest yourself
|email@example.com||"CVE® is a list of entries—each containing an identification number, a|
description, and at least one public reference—for publicly known cybersecurity
vulnerabilities." - from their site
|https://cve.mitre.org/data/downloads/index.html||This data is used by the US National Vulnerability Database, but that database|
doesn't have CSV downloads, only JSON and XML.
|firstname.lastname@example.org||Large group of businesses that were charged for not very legal activites, along|
with info about their location , name, date of charged, reason for being
charged, and result from trial
|https://data.cityofnewyork.us/Business/Charges/5fn4-dr26||Go to Business, then datasets, and it's the 9th one (for me at least)|
|email@example.com||This shows consumer complaints against businesses and the reason for the|
|https://data.cityofnewyork.us/Business/Consumer-Services-Mediated-Complaints/nre2-6m2s||it has a restitution column which shows how much the person that complained was|
|firstname.lastname@example.org||The data that I found was about consumer complaints against businesses that|
were mediated by the DCA Consumer Services Division during the last and current
calendar years or general complaints against businesses.
|I thought the data in the data set was very interesting because not only are|
you given the complete address, but also the longitude and latitude. Therefore
it is possible to get the longitude and latitude assign the business name with
the numbers and maybe plot it on a map.
|email@example.com||Citywide Payroll data||https://data.cityofnewyork.us/City-Government/Citywide-Payroll-Data-Fiscal-Year-/k397-673e||Just a normal csv file.|
|firstname.lastname@example.org||Average class sizes of certain classes in schools in 2010-2011 (I even found|
some data about Stuyvesant).
|https://data.cityofnewyork.us/Education/2010-2011-Class-Size-School-level-detail/urz7-pzb3||If you want the csv file just hit the export button in the top right and click|
|email@example.com||2010 AP results for NYC schools||https://data.cityofnewyork.us/Education/2010-AP-College-Board-School-Level-Results/itfs-ms|
|there's no paywall and it can be downloaded as a csv file|
|firstname.lastname@example.org||The data covers 258 schools in NYC and provides information regarding how many|
AP test takers were in that school in 2010, how many APs were taken in the
respective school, and the number of exams with a score of 3, 4, or 5.
|Many cells are left blank if no data is available or if the number is 0,|
instead of the actual number 0 or N/A.
|email@example.com||# of APs taken in various schools and # of passing grades||https://data.cityofnewyork.us/Education/2012-AP-Results/9ct9-prf9||There's a lot of lines with bad data ('s') and there's also a few schools with|
|firstname.lastname@example.org||My data is of 2012 AP results. It lists the school, numbers of AP test takers, |
number of AP tests taken, and the number of AP tests passed.
|https://data.cityofnewyork.us/Education/2012-AP-Results/9ct9-prf9||It has very similar problems as the class example such as 's' instead of|
numbers, and the website does not do a good job sorting the information.
|email@example.com||School Student Demographics 2013-18||https://data.cityofnewyork.us/Education/2013-2018-Demographic-Snapshot-School/s52a-8aq6||Press Export and then download as a CSV file.|
|firstname.lastname@example.org||this is a VADIR (Violent And Disruptive Incidents Report) for schools in New|
|https://data.cityofnewyork.us/Education/2014-2015-VADIR-INCIDENTS-3/diks-hcwd/data||I noticed that as there was a different column for each crime, there were much|
more columns than in other lists we've seen
|email@example.com||2016 School Safety Report||https://data.cityofnewyork.us/Education/2016-2017-School-Safety-Report/rear-wh5i/data||The data is exportable to CSV files, so you can just download it to your|
|firstname.lastname@example.org||SHSAT offers by New York City Middle School (2017-18)||https://data.cityofnewyork.us/Education/2017-2018-SHSAT-Admissions-Test-Offers-By-Sending-|
|This data set is very easy to access, as the data is presented very clearly,|
and all of the columns are concepts we are familiar with (Middle Schools,
Offers, Applicants, etc.) One feature I really enjoyed exploring was the
visualization feature, which allows the user to click through different types
of graphs and see the data represented in a variety of ways. The most helpful
graph is probably the Combo Chart, because it clearly presents the rows of
middle schools, and their numbers in descending order. Highly recommend taking
|email@example.com||Shsat school acceptances||https://data.cityofnewyork.us/Education/2017-2018-SHSAT-Admissions-Test-Offers-By-Sending-|
|firstname.lastname@example.org||2017 - 2020 Monthly Grade Level Attendance by School||https://data.cityofnewyork.us/Education/2017-2020-Monthly-Grade-Level-Attendance-by-School|
|Can be accessed as a csv file. Really simple to understand, with 10 columns,|
containing details about the school, the month, and the number of absences and
presences each month. It is published by the DOE, and it is updated monthly.
|email@example.com||List of Schools Receiving a Quality Review in 2018-2019||https://data.cityofnewyork.us/Education/2018-2019-Quality-Review-School-List/2rr4-rfvc/dat|
|Search for list of school names in NYCOpenData, the first link.|
|firstname.lastname@example.org||This set of data is released by NYC DOE. It's about the number of eighth|
graders, number of SHSAR test takers, and the number of students who got offers
from specialized high schools in each middle school of NYC in the year of 2018
|This set data is interesting because it's about SHSAT admission numbers for the|
year 2018-2019, which is the year we (as class of 2022) took the SHSAT and
entered specialized high school. It shares a similar format with the 2010-SAT
data we worked on before.
|email@example.com||Student attendance per school from 2017-2020||https://data.cityofnewyork.us/Education/2019-2020-SHSAT-Admissions-Test-Offers-By-Sending-|
|firstname.lastname@example.org||Number of offers into specialized high schools 2019-2020||https://data.cityofnewyork.us/Education/2019-2020-SHSAT-Admissions-Test-Offers-By-Sending-|
|• It contains the name of the school, the school, the number of students in|
high school admissions, the number of students who took the SHSAT, and the
total number of offers into specialized high schools for that year. • Whenever
the number of test-takers or students accepted is 5 or lower, an exact value is
not provided and instead, the range 0-5 is given. • The same issues seen in
class arise with the built-in sorting mechanism of the website: 20 is placed
above 100 when the data is sorted in "descending" order because it's sorting
strings and not numbers
|email@example.com||Free meals in time of crisis||https://data.cityofnewyork.us/Education/COVID-19-Free-Meal-Locations/2rg5-7s3w||Dunno|
|firstname.lastname@example.org||A table with information about squirrel sightings (location, date, color, etc.)||https://data.cityofnewyork.us/Environment/2018-Central-Park-Squirrel-Census-Squirrel-Data/|
|email@example.com||The data is the record of all the spotted squirrels in central park. There is|
information about the location spotted, physical attributes, and the action
that the squirrel was spotted doing.
|The data set is relatively small; it only has about 3000 rows and each row|
represents a different squirrel.There are 31 columns and the data types stored
are as integers, floats, strings, true/false (boolean), and location
(longitude/latitude). There are a lot of entries that contain missing data
especially the ones that are true/false. The data was last updated October 2019.
|firstname.lastname@example.org||Description of Squirrels spotted in Central Park||https://data.cityofnewyork.us/Environment/2018-Central-Park-Squirrel-Census-Squirrel-Data/|
|I found this in the Environment section of NYC OpenData and scrolled down. It|
includes a color map, which is helpful in visualizing the data. It includes
location, color, and action of squirrels.
|email@example.com||info on many of the squirrels in prospect park||https://data.cityofnewyork.us/Environment/2018-Central-Park-Squirrel-Census-Squirrel-Data/|
|specifies color, age, location found, and what they were doing when they were|
|firstname.lastname@example.org||It's a squirrel census, which records the number of sightings and details of|
squirrels (location coordinates, age, primary and secondary fur color,
elevation, activities, communications, and interactions).
|You can download it as a CSV file. The species analyzed by this Squirrel census|
is the Eastern gray (Sciurus carolinensis).
|email@example.com||squirrels in central park||https://data.cityofnewyork.us/Environment/2018-Central-Park-Squirrel-Census-Squirrel-Data/|
|firstname.lastname@example.org||The database I found, was of a squirrel census in Central Park in 2018. The|
data was collected in the name of a project titled, The Squirrel Census. The
Squirrel Census is a multimedia science, design and storytelling project that
focuses specifically on the Eastern Gray Squirrel. The squirrels are counted in
the database and further details regarding each squirrel is listed as well. For
example, the file provides data on a specific squirrel's fur color, location
coordinates, age, elevation, activities, communications and interactions with
fellow squirrels and other life forms in Central Park.
|To look at the data without downloading just click the 'View Data' button and|
to download the file as a csv (or other) file click the 'Export' button.
|email@example.com||The data describes the squirrel population in Central Park, listing their age,|
primary fur color, location, as well as movements (running, chasing, climbing,
eating) that are listed as True or False.
|To access the data, export/download it as a CSV or TSV file.|
|firstname.lastname@example.org||A very detailed account of all the squirrels in Central Park, including where|
they are and what they look like (secondary fur color). This is part of a
project called, appropriately enough, Squirrel Census. Also, quite
surprisingly, this came up when I searched "food." I would rather not know why.
|https://data.cityofnewyork.us/Environment/2018-Squirrel-Census-Fur-Color-Map/fak5-wcft||Hovering over a circle lets you see more information about it. Pretty easy to|
|email@example.com||Air quality (many metrics) of many areas in NYC||https://data.cityofnewyork.us/Environment/Air-Quality/c3uy-2p5r||It's a CSV|
|firstname.lastname@example.org||The green infrastructure put around the city.||https://data.cityofnewyork.us/Environment/DEP-Green-Infrastructure/spjh-pz7h#revert||clicking the export tab u can download it as a csv file|
|email@example.com||This is a website that shows all the places where you can drop off food scraps.|
It gives the location of the areas to the longitude and latitude. It also shows
the hours of operation.
|https://data.cityofnewyork.us/Environment/Food-Scrap-Drop-Off-Locations-in-NYC/if26-z6xq||You have to download it as a csv file? The attachment listed is not usable.|
|Besides the map of New York City are tabs. To access the data, click on the|
export tab and scroll down to find a csv file.
|firstname.lastname@example.org||Information about calls for animal assistance, relocation, and/or rescue|
completed by the Urban Park Rangers.
|Some of the dates are just a series of hashtags and not all of the cells are|
|email@example.com||Water Consumption in NYC - How much water New Yorkers are using, 1979-2008||https://data.cityofnewyork.us/Environment/Water-Consumption-In-The-New-York-City/ia2d-e54m||It can be exported in .csv format. It's pretty straightforward and from a quick|
glance, NYC is using less water per day even despite its growing population.
Maybe another dataset could give a clue to how..
|firstname.lastname@example.org||New York water consumption||https://data.cityofnewyork.us/Environment/Water-Consumption-In-The-New-York-City/ia2d-e54m|
|click on link, export, and download as csv. It states population every year;|
you can see if that correlates with water consumption.
|email@example.com||Restaurant Inspection Results; reports based on how hygenic a restaurant is.||https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/rs6|
|You can search by type of violation, zip code, borough where violation occured,|
type of cuisine, just a list of every restaurant that received a citation, and
much more. More importantly, the grading system is as follows: Grades A, B, and
C as expected, then N/Z for grade pending or ungraded, then P for grade pending
after an inspection resulted in a closure of the restaurant (basically an F in
|firstname.lastname@example.org||This data reports the number of HIV/AIDS diagnoses by neighborhood, sex, and|
race/ethnicity in NYC from 2010 to 2013.
|There's a lot of interesting ways for Python to process the data but there are|
data where there's either "All" races or "Unknown" race as well as "All" sexes.
The data is also separated by year from 2010 to 2013.
|email@example.com||Infant Mortality rate by maternal race/ethnicity||https://data.cityofnewyork.us/Health/Infant-Mortality/fcau-jc6k/data||The data has some sections labeled Other/Two or More for the ethnicity category|
that has no numbers for the mortality rates.
|firstname.lastname@example.org||This data states the concentration of mercury and other heavy metals present|
within consumer products.
|There are some pieces of data where it is stated that there is a negative|
concentration of the metal in question, which is odd.
|email@example.com||Information about dogs owned in NYC. Names, gender, breed, DOB, and borough|
that the dog resides in.
|https://data.cityofnewyork.us/Health/NYC-Dog-Licensing-Dataset/nu7n-tubp||The information is updated yearly and it's probably possible to find your dog|
on the list if you own one
|firstname.lastname@example.org||This is a chart of hospitals, questions about the hospital, one answer, and the|
percent of people that got that answer.
|Click the link, scroll all the way down.
Click on view data.
Then on the right|
side there will be multiple options. Click on export. Under the export option
there will be a download option. Click csv.
|email@example.com||NYC leading causes of death||https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam||7 columns: year, leading cause, sex, race ethnicity, deaths, death rate, age|
adjusted death rate; there are periods where there is no data in the last 3
columns. Last 3 columns are integers/floating point numbers.
|firstname.lastname@example.org||Population Stats: Causes of Death per Year divided by sex and ethnicity.||https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam||The data is not in an easy to see order it is somewhat random: It doesn't go|
F,Hispanic to M,Hispanic to F,Asian. So in order to get a visual idea you would
need to sort the data pretty well.
|email@example.com||This dataset provides the leading causes of death in New York City from|
|https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam||One of the columns in this data set is "Age Adjusted Death Rate", which is a|
death rate that accounts for ethnicity, racial, and population age differences.
|firstname.lastname@example.org||New York City's leading causes of death||https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam||. Last updated on February 2020, but only includes information up until year|
2014. There are 1094 rows and 7 columns
|email@example.com||Leading cause of death||https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam||Just click view data. There are 7 rows and it provides with many things u|
|firstname.lastname@example.org||nyc leading causes of death||https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam||the data is organized by race, sex, and death rate (both normal and age|
|email@example.com||Leading Causes of Death in NYC||https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam||Visualize->Dimension->Leading Cause; bar chart or column chart for best|
|firstname.lastname@example.org||New York City Leading Causes of Death||https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam||To download the data, go on the website and press export, which will allow you|
to download the data in different formats.
|email@example.com||Leading Causes of Dead||https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam/data||You can filter the data for just a specific group of people (sex and race) or|
for a specific year. For example, for leading cause of in Asian and Pacific
Islander women in 2014 was cancer.
|firstname.lastname@example.org||The data gives a representation of where people have been getting vaccines and|
additional information about the patient that could give hints towards their
|26 columns and 885 rows.|
|email@example.com||Most popular baby names.||https://data.cityofnewyork.us/Health/Popular-Baby-Names/25th-nujf||You can download the data if you click on the link and click "export" and then|
"CSV" (like most of the files on the site). The data lists the year of birth,
gender, ethnicity, name, number of people with the name, and popularity rank.
|firstname.lastname@example.org||The data represents a ranking of baby names in New York City by frequency. The|
data includes ethnicity, gender, baby count for each name, and year.
|https://data.cityofnewyork.us/Health/Popular-Baby-Names/25th-nujf||To access the data, you can click on the export button and choose the format|
you would like to download it in. In addition, there is something really
interesting about this data set. You can create a visualization and view it in
many different formats ad graphs.
|email@example.com||The data describes popular birth names and their genders. I used the NYC Open|
Data info for this
|https://data.cityofnewyork.us/Health/Popular-Baby-Names/25th-nujf||You can download this as a csv file|
|firstname.lastname@example.org||Popular baby names||https://data.cityofnewyork.us/Health/Popular-Baby-Names/25th-nujf||Each sex and race has its own rankings. For example, male hispanics and female|
hispanics each have their own ranking list.
|email@example.com||Popular baby names for boys and girls of different ethnic groups||https://data.cityofnewyork.us/Health/Popular-Baby-Names/25th-nujf/data||No|
|firstname.lastname@example.org||The number of arrests per race each year||https://data.cityofnewyork.us/Public-Safety/Crime-Enforcement-Activity/qk6i-zcht||The button "Crime Enforcement Activity" lets you save the documents. The table|
consists of percentages for each race.
|email@example.com||This data reports the admission of inmates into NYC Department of Correction|
facilities. The 7 columns (from left to right) describe Inmate ID, Date of
Admission, Date of Discharge (if applicable), Race, Gender (M or F), Inmate
Status Code, and the code of the inmate's top legal charge.
|https://data.cityofnewyork.us/Public-Safety/Inmate-Admissions/6teu-xtgp||This data can be downloaded as a CSV file. One should note that some of the|
rows of data do not have complete information (e.g., the discharge date is
unavailable, because the inmate that the information pertains to has not yet
|firstname.lastname@example.org||NYPD Arrest Data||https://data.cityofnewyork.us/Public-Safety/NYPD-Arrest-Data-Year-to-Date-/uip8-fykc||-Level of offense is sorted into felony (F), misdemeanor (M), and Violation|
(V) -Data can also be sorted by race and sex of offender, borough, type of
crime, and age group, and even latitude and longitude (among other
things). >200K rows
|email@example.com||Directory Of Toilets In Public Parks||https://data.cityofnewyork.us/Recreation/Directory-Of-Toilets-In-Public-Parks/hjae-yuav||The download to the table of data is on the link of New York data.|
|firstname.lastname@example.org||Eateries (e.g. snack bars, food carts, mobile food trucks, restaurants) in NYC||https://data.cityofnewyork.us/Recreation/Directory-of-Eateries/8792-ebcp||You could access its json file. Since python supports it, you can import json|
and read it with the 'json.load()' method.
|email@example.com||New York Public Library Stats||https://data.cityofnewyork.us/Recreation/New-York-Public-Library-NYPL-Branch-Services-from|
|There is a lot of information that can be sorted with it.|
|firstname.lastname@example.org||The data gives addresses, phone numbers, and website links to theaters in NYC.||https://data.cityofnewyork.us/Recreation/Theaters/kdu2-865w||I downloaded it as a .csv file. It can be opened using gedit or TextEdit|
(probably other textediting softwares too) but then all the data is smushed
together. To read it as a full readable table I opened it using the Numbers
|email@example.com||its all the police calls in 2006 that requests their services||https://data.cityofnewyork.us/Social-Services/311-Service-Requests-for-2006/hy4q-igkk||There are a lot of other columns for more specific requests or crimes that are|
left blank, really only the first 10 or so columns have anything in them.
|firstname.lastname@example.org||Abuse/neglect rate per county||https://data.cityofnewyork.us/Social-Services/Abuse-Neglect-by-Community-District-CD-/rnjn|
|email@example.com||Shows the population of homeless people in an area of NYC.||https://data.cityofnewyork.us/Social-Services/Directory-Of-Homeless-Population-By-Year/5t4|
|Press export -> Download -> and then download as CSV.|
|firstname.lastname@example.org||Taxi stats||https://data.cityofnewyork.us/Transportation/2015-Yellow-Taxi-Trip-Data/ba8s-jw6u||Comes with graphs as well, columns to note: Passenger count, trip distance,|
fare amount, tips, and payment type
|email@example.com||Closed street potholes inspected + repaired by NYCDOT||https://data.cityofnewyork.us/Transportation/Street-Pothole-Work-Orders-Closed-Dataset-/x9|
|Sort by newest, scroll all the way down, and you can export as csv. It is a|
little bit lengthy with 290,000 rows and 16 columns which is a lot to process.
|firstname.lastname@example.org||Traffic Volume||https://data.cityofnewyork.us/Transportation/Traffic-Volume-Counts-2012-2013-/p424-amsu||only during 2012-2013, not recent. is a csv file if click export|
|email@example.com||Locations of farmers markets throughout NYC with the days they are open and the|
|https://data.cityofnewyork.us/dataset/DOHMH-Farmers-Markets/8vwk-6iz2/data||Note that most of the data are strings rather than numbers. You can still|
analyze it though.
|firstname.lastname@example.org||emission of varied greenhouse gases (kg per capita) by different countries over|
|https://data.oecd.org/air/air-and-ghg-emissions.htm#indicator-chart||the data can be downloaded as a csv file that contains the specific greenhouse|
gas (or all the greenhouse gasses), kg per capita, country, and year.
|email@example.com||San Francisco plant and ecosystem data||https://data.sfgov.org/Energy-and-Environment/San-Francisco-Plant-Finder-Data/vmnk-skih||Click "Export", next to "View Data" and "Visualize" then select csv|
|firstname.lastname@example.org||Traffic Speed Data||https://data.world/ahalps/social-influence-on-shopping||They make you sign up to their website to be able to access the data. So it is|
kind of annoying, but it is in .csv format
|email@example.com||DC voter registration data||https://data.world/codefordc/dc-voter-registration-data/workspace/file?filename=20141218-d|
|It asks you to make a free account (or sign in with Google), then you can|
|firstname.lastname@example.org||Many financial statistics||https://data.worldbank.org/||It's free|
|email@example.com||Lists the GDP per capita for every recognized country in the world, for the|
last approximately 50 years.
|https://data.worldbank.org/indicator/NY.GDP.PCAP.CD?view=map||On the rick, click download 'CSV file.'|
|firstname.lastname@example.org||jk that wasn't the last one. here's one more dataset of datasets, including|
(among other things) walruses, doggies, music, hate crimes, and food!
|email@example.com||the free encyclopedia, and other things||https://dumps.wikimedia.org/||ok this is the last one|
|firstname.lastname@example.org||Top Grossing Movies from 2007-11||https://gist.githubusercontent.com/Beelat2018/29fb23175c5161797f17fc1f970824fb/raw/df29c04|
|Sourced from Github, user Beelat2018, alternate link here:|
|email@example.com||perfectly appropriate and clean words||https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words||no|
|firstname.lastname@example.org||another list of datasets||https://github.com/awesomedata/awesome-public-datasets||no|
|email@example.com||Coronavirus data in the US by county, except for a few areas, which are by city.||https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv||It is only viewable on github as raw data.|
|firstname.lastname@example.org||personalized license plates||https://github.com/veltman/ca-license-plates||no|
|email@example.com||Data sets relating to health, compiled by the government. They have everything|
relating to cost of hospital trips to stuff as recent as covid cases and tests.
|https://healthdata.gov/||It's a super easy website to navigate. You can search for keywords, locations,|
or the date the data was last modified. Once you find a data set that interests
you, you can either preview it, or download it as a CSV.
|firstname.lastname@example.org||The data is about the NFL quarterbacks in the 2011 season. It contains their|
names, the team they played on, and statistics about them. These statistics
include the amount of fumbles done during that season, amount of sacks, and
amount of rushing touchdowns. These are just three types of statistics included
in this file.
|After entering the link, you will see another link with the words “The files|
are located here.” Clicking that link will take you to another page with
several files and folders. The file I downloaded was in the NFL folder. After
clicking on that folder, you will see more files and folders. The file I picked
is called “NFL 2011 QB.csv.” Just right click it (or control click on a Mac)
and download the file.
|email@example.com||Describes the current and past weather at any given location.||https://openweathermap.org/api||You need to import a json and http library to successfully use the API (which|
also requires a key). First, create the link using the zipcode you want, and
then go to it to grab the json. Any fully implemented json parser can do the
job. ~15 LOC
|firstname.lastname@example.org||My data is a report on the total amount of coronavirus cases of every country|
by date starting from 12/31/2019-4/27/2020
|https://ourworldindata.org/coronavirus-source-data||Just look for the one that says total confirmed cases. I chose this link|
because this data relates to recent news and I just thought it'd be interesting
in general to see which countries became infected in relation to previously
|email@example.com||Agriculture statistics, including land use.||https://quickstats.nass.usda.gov||When you retrieve data for an entry, pressing the "spreadsheet" button allows|
you to obtain a .csv file of the data.
|firstname.lastname@example.org||set of sets||https://vincentarelbundock.github.io/Rdatasets/datasets.html||no|
|email@example.com||Historic surface temperature of Raleigh city, US||https://www.chapelhillopendata.org/api/v2/catalog/datasets/earth-surface-temperature-data0|
|when you open the link, it will send the csv data to your downloads. It has six|
columns and is formatted very similarly to the SAT csv files we have gotten
before. Unlike before, if there is no data, there will not be an s to fill the
blank. The las three columns show the same data in every row.
|firstname.lastname@example.org||The data I found is about Citi Bike members and their rides.||https://www.citibikenyc.com/system-data||It includes the following:
Trip Duration (seconds)
Start Time and Date
Time and Date Start Station Name End Station Name Station ID Station
Lat/Long Bike ID User Type (Customer = 24-hour pass or 3-day pass user;
Subscriber = Annual Member) Gender (Zero=unknown; 1=male; 2=female) Year of
Birth The data is divided monthly and starts in June of 2013 until March of
|email@example.com||It’s a collection of over 200,000 datasets of US government data that are|
separated into 14 different topics.
|https://www.data.gov/||There are many, many different data sets and data tables that is relatively|
easy to access, most of which are downloadable.
|firstname.lastname@example.org||The data I present is baseball stats of the 2019 season by team. The specific|
stats are hitting stats of each team, for example batting average, number of
home runs, etc.
|https://www.espn.com/mlb/stats/team/_/view/batting||The chart will be there as soon as you join the link. However, in order to|
upload it as a CSV file, you have to do it from excel. You export data from the
upload from web option.
|email@example.com||This dataset looks at the all the songs of each Spotify "All Out..." Decade|
Playlists and looks at their attributes such as bpm or dancibility.
|https://www.kaggle.com/cnic92/spotify-past-decades-songs-50s10s#1950.csv||Each decade (50s to New 10s) is split into one csv file and there is no file|
from the link that combines all of them. However, since all of the files are in
the same format. These attributes are measured by an API, which tries to
calculate things such as Beats Per Measure, Acousticness, and even how "happy"
a song is. There are many ranges associated to these attributes, but generally,
the larger a number is, the more it matches to the quality it is measuring.
More details about this API can be found here:
|firstname.lastname@example.org||It takes the data of songs found in Spotify's "All out..." decade playlists and|
lists the attributes found from Spotify's API.
|https://www.kaggle.com/cnic92/spotify-past-decades-songs-50s10s/version/9||While downloading this data is free, you need to make a Kaggle account (which|
is also free) There are 7 separate data sheets (1 for each decade). If you
want to have a spreadsheet that includes all the decades, you will need to make
one yourself. It is easy since all the spreadsheets follow the same format. As
said previously, the attributes of each song are measured by an API Spotify has
that measures certain qualities of a song. These include Beats Per Minute,
whether it is acoustic or not, and even how happy a song is. They all have
different ranges of values, but usually the higher the number is, the more of a
certain quality it is. More information can be found here:
eatures/ disregard my last submission, this one has more info.
|email@example.com||There is a plethora of data contained within this downloadable file: there are|
16 rows about AIRBNB related data. There rows include listing name, host name,
price, neighborhood, and reviews.
|https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data/data#AB_NYC_2019.csv||There are hundred and hundreds of listings and accessing the data is relatively|
easy. Simply visit the link above to reach the Kaggle website: it did make me
login with google, however. When you reach the page, there is a box that says
"download" and the file is 7 mb plus it automatically downloads as a csv. The
csv has a few gaps in some rows because not all of the listings have received
reviews and ratings that are columns on the table.
|firstname.lastname@example.org||This dataset contains information on player reconnaissance in over 500|
professional-level Starcraft games. From the perspective of one player (the
Terran), it contains information on how many enemy (Protoss) units the player
has observed, can observe, has seen destroyed, etc., along with an overall
measure of how much enemy territory the player can see.
|https://www.kaggle.com/kinguistics/starcraft-scouting-the-enemy||You have to make a kaggle account to download it|
|email@example.com||Wins, loses, ties, and win percentage of all 32 teams of the NFL in the 2019|
season as well as what division they play in.
|https://www.pro-football-reference.com||when you click on one of the teams it tells you extra information.|
|firstname.lastname@example.org||Play-by-play (more recently, pitch-by-pitch) data for mostly every MLB game|
|https://www.retrosheet.org/game.htm||This might not be very interesting for those not into baseball.
download the data, it will appear as a .zip file. When you open the file, the
records will have unusual filename extension that actually refers to what's in
the file (specifically whether it's an event log or a roster and the league –
american or national). All of the files are in CSV
format. https://www.retrosheet.org/datause.txt and
https://www.retrosheet.org/eventfile.htm have a good explanation for what all
of the notation means and what should be expected for each line. For example,
"play,2,1,jeted001,11,BS1.X,63/G" means that in the bottom of the second
inning, Derek Jeter took a ball, then had a swinging strike, then the pitcher
threw a pickoff throw to first, then he hit a ground ball at the shortstop who
threw it to first base for a putout.
|email@example.com||2018 Census Results about things such as ethic groups and hours worked per week||https://www.stats.govt.nz/assets/Uploads/2018-Census-totals-by-topic/Download-data/2018-ce|
|https://www.stats.govt.nz/large-datasets/csv-files-for-download/ (I found it on|
this website and scrolled down to the first link in the Census area)
|firstname.lastname@example.org||abortion rates sorted by age of women||https://www.stats.govt.nz/large-datasets/csv-files-for-download/||you can scroll down and download this file, and open if by unzipping it. This|
website also contains several other data and this file, and probably all the
others, have more than one data set.
|email@example.com||This collection of data is about the film industry, specifically the budget,|
national and global box office sales, dates, titles, etc about specific movies,
franchises, or groupings of movies.
|https://www.the-numbers.com/||For our purposes, it is probably best to look at the box office, home video,|
and movie categories to see domestic and international data. Some tables show
percent change along with the categories listed above and other useful
information in judging the success of a film.
|firstname.lastname@example.org||Mean BMI of both male and female older than 18 from 2016 to 1983 in all of the|
countries of the world by the WHO
|you are going to have to choose "export file"
then under file format, you will|
have the option to choose "CSV 30000 row" finally, you can just click export
and you will have all this juicy information
|email@example.com||This data is historical. It counts how cases of murder, rape, robbery, felony|
assault, burglary, grand larceny, and grand larceny of a motor vehicle there
were in each year from 2000 to 2019
|It looks like a simple 9 by 21 table and I was able to download the csv file.|
You can find the download link by searching up citywide crime statistics on NYC
open data, clicking SHTML, and clicking historical crime stats. Then, u
download the file for Citywide Seven Major Felony Offenses. Of course, I don't
completely trust crime statistics because arrests are never concrete; false
accusations happen a lot. In addition, these are only reported cases. Not
everyone is going to call the cops when they are assaulted.
|firstname.lastname@example.org||It is a record of all the stop, question, and frisks performed in NYC in 2019.|
It contains massive amount of data about basically anything one could know
about that situation.
|https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page||I chose the link at the top, the 2019 excel file.|