A Comprehensive Glossary of Terms for Web Scraping

Written by Umang Gupta onAugust 11, 2023

Web scraping has become an essential tool for extracting data from websites in various industries.

However, understanding the terminology associated with web scraping can sometimes be challenging.

In this blog post, we provide you with a comprehensive glossary of terms that will definitely guide you to navigate the world of web scraping easily.

Whether you are new to data extraction or a seasoned professional, this glossary will serve as a handy reference to ensure you stay well-informed.

1. Account

An account represents an individual customer account, a business, or even a partner organization with whom we do business. It serves as the basis for managing and organizing data scraping projects.

2. Account Owner

Similarly, the Account Owner is a designated point of contact from Grepsr responsible for delivery, support, and account expansion. This role is reserved for certain account types and ensures smooth communication and coordination between the customer and Grepsr.

3. Data Platform

The Data Platform is Grepsr’s proprietary, enterprise-grade system for data project management. It consists of two complementary pieces, first is the backend infrastructure that handles data extraction and management. Consecutively, the frontend interface enables users to configure and monitor their scraping projects.

4. Data Project

A project is a vehicle through which customer requirements are translated into workable data, and value is delivered. It includes data requirements such as URLs and data points to extract, as well as additional instructions required to pull data effectively.

Data to make or break your business

Get high-priority web data for your business, when you want it.

Get started

5. Data Report

Project requirements are grouped into sets called Reports. A Report represents a use case or a granular set of data and delivery requirements. They can execute at once and deliver together. Each Report is associated with a set of programmatic instructions to source data known as a Crawler or Service.

6. Data Crawler (or Spider)

A Crawler programmatically opens and interacts with a website to parse content and extract data. It is versioned to reflect changes in the data scope over time. As a result, a successful Project has at least one Report associated with a unique Crawler version.

7. Run

A Run is the execution of a Crawler. It retrieves data from the target website based on the defined instructions and configuration.

8. Dataset

A Dataset is the data output resulting from a Run. It contains the extracted data in a structured format ready for analysis and processing.

9. Page

Pages within a Dataset are similar to sheets in a spreadsheet. Each Dataset consists of at least one Page, which allows for the normalization of the final output, akin to a relational database or separation of concerns.

10. Columns

Columns are the extracted fields in a Dataset or a Page in a Dataset. They organize the data and provide a clear structure to the extracted information.

11. Indexed Column

Indexing a column is a crucial process in database management. It implies that the generated data output for that particular column is stored in a way that allows filtering, sorting, and searching across millions of records without any delay.

12. Rows

Each line of record in a Dataset is a Row. Rows contain the extracted data for each specific instance or entry.

13. Object

In a JSON output, a Row of records is an Object. Unlike a Row, an Object can be layered, allowing for a more complex structure of data representation.

14. Data Quality

Quality is an umbrella term to measure the quantitative, qualitative, and overall health of a Report. It takes various factors into consideration. It includes Accuracy, Completeness, Data Distribution, Rows, and Requests.

15. Data Accuracy

Accuracy is a numeric score, expressed as a percentage, that measures if the sourced data complies with the expected data format. Rules assigned to different Columns in a Dataset validate the compliance. Hence, higher Accuracy indicates better adherence to data standards.

16. Data Completeness

Completeness refers to the state where the data contains all the information available to extract from the source. A Fill Rate measures it which calculates the data density within the Dataset.

17. Fill Rate

Furthermore, the Fill Rate is a numeric score, expressed as a percentage, that measures the data density within a Dataset. It indicates the number of empty cells versus cells with data. Additionally, a higher Fill Rate signifies a more complete Dataset.

18. Data Distribution

Data Distribution measures the occurrence of a certain value in a Column. It is particularly useful for Indexed Columns and acts as a proxy for data quality. However, if the data distribution deviates from the norm, it may indicate potential issues with the sourced data.

19. Data Crawler Requests

A Request is an HTTP request made to the server to retrieve content. Subsequently, the Crawler makes a series of Requests to load and interact with a web page to extract the necessary data. Afterwards, the content request is either served by the server or failed, indicating an error.

20. Team

A Team refers to a set of users belonging to the same Account. Teams can have different roles, such as Team Manager or Viewer. The Team Manager has administrative rights and access to all Projects in the Account, while the Viewer has limited rights and access only to specific added Projects.

In conclusion

Dive into Grepsr’s all-inclusive glossary of web scraping terms, tailored to empower you with the knowledge needed to excel in data extraction. Altogether, web scraping is a powerful technique for extracting data from websites, and understanding the associated terminology is essential. Thus, this glossary provides a comprehensive list of terms that will help you navigate the world of web scraping with confidence.

Therefore, either as a beginner or an experienced user, having a clear understanding of these terms will empower you to effectively leverage web scraping in your data-driven projects.

Web data made accessible. At scale.

Tell us what you need. Let us ease your data sourcing pains!

data quality, data report, Spider, Web scraping glossary

BLOG

A collection of articles, announcements and updates from Grepsr

Article May 14, 2025

Biggest Web Scraping Challenges and How To Solve Them

The early days of web scraping were simple: a few lines of code could pull everything you needed. Today’s internet is armed with defenses and built on complex frameworks. There are several web scraping challenges to bog you down. Scrapers face everything from bot detection to complex site structures. Let’s talk about the biggest challenges […]

Article | Knowledge Base March 24, 2025

Why Data Quality Matters in Training AI Models

Data quality is the second biggest reason why almost 80% of AI projects fail, the first being a lack of right decision-making by a company’s leadership. AI is only as good as the data it learns from. Feed it junk, and it will confidently make mistakes at scale. When AI learns from flawed information, the […]

Article | Product January 15, 2025

Data Profiler For Data Quality at Your Fingertips

Using poor-quality data is like navigating with a faulty compass—you’ll never reach your destination. But, you don’t have to stay lost, Grepsr Data Profiler ensures that you know your data quality metrics inside out. High-quality, transparent data is the backbone of every data-driven organization. They are the foundation of competitive strategies, successful innovations, and informed […]

Explainer | Knowledge Base March 1, 2024

ETL for Web Scraping – A Comprehensive Guide

Dive into the world of web scraping, and data, learn how ETL helps you transform raw data into actionable insights.

Announcements | Explainer | Feature | Featured | Knowledge Base March 31, 2023

Know Your Data Quality Metrics With Grepsr

The importance of data quality cannot be overstated. One wrong entry and the corruption will spread without exception. The best way to counter this threat is to set up effective data quality metrics.

Articles September 21, 2021

Applications of Data Normalization in Retail & E-Commerce

From improving customer experience to establishing brand authority, data normalization has wide-ranging applications in retail and ecommerce.

Articles | Featured August 11, 2021

Perfecting the 1:10:100 Rule in Data Quality

Never let bad data hurt your brand reputation again — get Grepsr’s expertise to ensure the highest data quality

Articles | Knowledge Base July 2, 2021

What is Data Normalization & Why Enterprises Need it

In the current era of big data, every successful business collects and analyzes vast amounts of data on a daily basis. All of their major decisions are based on the insights gathered from this analysis, for which quality data is the foundation. One of the most important characteristics of quality data is its consistency, which […]

Announcements | Featured | Knowledge Base | Product April 16, 2021

QA at Grepsr — How We Ensure Highest Quality Data

Ever since our founding, Grepsr has strived to become the go-to solution for the highest quality service in the data extraction business. At Grepsr, quality is ensured by continuous monitoring of data through a robust QA infrastructure for accuracy and reliability. In addition to the highly responsive and easy-to-communicate customer service, we pride ourselves in […]

Articles April 6, 2021

Benefits of High Quality Data to Any Data-Driven Business

From increased revenue to better customer relations, high quality data is key to your organization’s growth.

Articles March 26, 2021

Five Primary Characteristics of High-Quality Data

Big data is at the foundation of all the megatrends that are happening today. Chris Lynch, American writer More businesses worldwide in recent years are charting their course based on what data is telling them. With such reliance, it is imperative that the data you’re working with is of the highest quality. Grepsr provides data […]

Articles November 26, 2020

Importance of Data & Data Quality Assessment

According to Charles Babbage, one of the major inventors of computer technology, “Errors using inadequate data are much less than those using no data at all.” Babbage lived in the 19th century when the world had not yet fully realized the importance of data. At least not in the commercial sense. Had Babbage been around […]

View all resources

Industries

Applications for E-Commerce

Applications for Jobs & Human Capital

Applications for Housing & Real Estate

Applications for AI/ML

Applications for Management Consulting

Applications for Healthcare

Mobile App Scraping – Extracting Data Hidden Behind App Interfaces

Mapping LA Wildfire Impact with POI Data

Scaling AI: How Grepsr Helped Improve Speech Recognition

Search here

Can't find what you are looking for?