Aim: Categorize new customers by segment and predict how much revenue they can generate

This real-world customer dataset with 31 variables describes 83,590 instances (customers) from a hotel in Lisbon, Portugal.


The data comprehends three full years of customer personal, behavioral, demographic, and geographical information.

İnformation about the variables used from the study

LodgingRevenue:Total amount spent on lodging expenses by the customer (in Euros). This value includes room, crib, and other related lodging expenses

OtherRevenue: Total amount spent on other expenses by the customer (in Euros). This value includes food, beverage, spa, and other expenses

DocIDHash:SHA2-256 hash-string of the identification document number the customer provided at check-in (passport number, national ID card number, or other)

Nationality:Country of origin. Categories are represented in the ISO 3155-3:2013

DistributionChannel:Distribution channel usually used by the customer to make bookings at the hotel


View Github