Check if you have training samples in your test set

Mar 22, 2022 1 min read

Did it spill?

Did you manage to spill samples from your train set to your test set?

from did_it_spill import check_spill

spills = check_spill(train_loader, test_loader)

print(f"You have {len(spills)} spills in your test set!")

The library computes hashes of your data to determine if you have samples spilled over from your train set to test set. Currently only for PyTorch.

Installation

pip install did-it-spill

Outputs

Function outputs a list of tuples. Each tuple corresponds to a leak. The first index is where in the first loader the leak was found, and the second index is the index where the the leak was found in the second loader.

Example output:

[(1244, 78)...(8774, 5431)]

The first leak was found at index 1244 in loader 1 and at index 78 in loader 2.

Debugging spills

The unthinkable happen. So what should I do now? Make sure you have shuffle = False for correct indexes.

for spill in spills:
    # Lets get both of the samples and double check that they really are the same
    index_train_set = spill[0]
    index_test_set = spill[1]

    # Get the data from dataset
    spilled_sample_train = train_dataset.__getitem__(index_train_set)[0]
    spilled_sample_test = test_dataset.__getitem__(index_test_set)[0]
    
    # This should always be true
    print(torch.equal(spilled_sample_train, spilled_sample_test))
    
    # From here on its up to you, maybe plot the data?
    print(spilled_sample_train)

GitHub

View Github

Checker

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.

Bitcoin

Bitcoin Tool checks balances for massive amount of addresses

bitcoin-balance-checker Tool checks balances for massive amount of addresses You can use this tool using the two address lists generated by my other tool which is very useful: https://github.com/mathiasdev1/pvk-mass-convert

17 January 2023

Checker

coinbase wallet checker

coinbase_checker utility script to check and log coinbase balances see requirements.txt for required libs. you will need to set up an API key/secret. put these in creds.py. they just

17 January 2023

Checker

Monocular Dynamic View Synthesis: A Reality Check

16 January 2023

Checker

Python Multi-Threading Mega.nz Account Checker

MegaPy Intro This is a proxyless Python mega.nz checker designed to be run in the cloud. Setup To set up this program run the following to install required libraries: pip3 install -r

16 January 2023

Netflix

Netflix Account checker that will check username/email:password to see if the account credentials are correct

NetflixChecker This repo provides a mass accounts Netflix checker using Proxy. Key Features Python 3.x Simple readable code Mass Accounts checker Combolist support HTTP Proxy Support What Next Graphical user interface. Multiprocessing.

16 January 2023

VPN

Nord VPN Checker With Multi-Threading

NordVPN-Checker USE FOR EDUCATIONAL PURPOSES ONLY About This tool should only be used for educational purposes only. This tool is used to demonstrate penetration testing on how certain cyber-criminals can bruteforce certain websites

16 January 2023

Instagram

Instagram Account checker that will check username/email:password to see if the account credentials are correct

insta-checker Installation Windows & Linux & OS X pip3 install -r requirements.txt How to Use Write the user names of the accounts line by line in the accounts.txt file. And run

16 January 2023

Checker

Check multiple accounts validity on NordVPN

NordVPN-Checker by Rdimo0 Check multiple accounts validity on NordVPN. Features Title update. Multithreading. Proxy rotation. Useragent rotation. Expire check. Detailed hits. Preview Installation First method, make sure you have python 3.8.7

16 January 2023

Check if you have training samples in your test set

Did it spill?

Installation

Outputs

Debugging spills

GitHub

John

An extension package of Datasets that provides support for executing arbitrary SQL queries on HF datasets

Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation

Did it spill?

Installation

Outputs

Debugging spills

GitHub

An extension package of Datasets that provides support for executing arbitrary SQL queries on HF datasets

Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation

You might also like...