dnsnero.blogg.se - Fast.ai tabular data pass np.array

#Fast.ai tabular data pass np.array how to

Should you have any suggestions you are willing to share, please let me know. Also strange it didn't fillna while fitting test to dls. If the number of classes in any feature is different for test, it will freak out. Unfortunately, this is still a workaround to a problem.

#Fast.ai tabular data pass np.array how to

Very little info on how to solve it online. Sub = pd.read_csv('./input/house-prices-advanced-regression-techniques/sample_submission.csv')Īpologies for the long narrative but I'm relatively new to coding and this problem was hard to point out. Test_imp = test_imp.fillna(test_imp.median())Īfter that we can finally predict: dl = _dl(test_imp, bs=64)įinal_preds = np.exp(preds.flatten()).tolist() So just find and fill nans in test: missing = test_imp.isnull().sum().sort_values(ascending=False).head(12).index.tolist() It would just work (preprocessing of cont values and predict), but no. Preds, _ = learn.get_preds(dl=dl) # get prediction I thought after we do: dl = _dl(test_imp, bs=64) Learn.fit_one_cycle(20, slice(1e-2, 1e-1), cbs=)Īt this point, we have a learner but still can't predict. Learn = tabular_learner(dls, n_out=1, loss_func=F.mse_loss) Splits = RandomSplitter(valid_pct=0.2)(range_of(train_imp))Ĭont_names = cont_cols, # we need to exclude target Train_imp = np.log(train_imp) # metric for kaggleĪfter that, we do as per fastai tutorial. Test_imp = pd.concat(]], 1) # exclude SalePrice After that, we need to assemble everything back together train_imp = pd.concat(], 1) # assemble new cat and old cont together

To.items will gave us transformed cat columns. Test_to_cat = to.items.iloc:, :] # transformed cat for test. Train_to_cat = to.items.iloc, :] # transformed cat for train Test] = test].astype('float') # slice target off (I had mine at the end of cont_cols) train = train.astype('float') # if target is not float, there will be an error later I just decided to combine them (jus to make it work), but doesn't it introduce leakage? combined = pd.concat() # test will have nans at target, but we don't careĬont_cols, cat_cols = cont_cat_split(combined, max_card=50) At this point there are different classes in some cat cols. indexed (bool): The DataLoader will make a guess as to whether the dataset can be indexed (or is iterable. droplast (bool): If True, then the last incomplete batch is dropped. shuffle (bool): If True, then data is shuffled every time dataloader is fully read/iterated. I really wanted to make it work start to finish with fastai:Ĭolumns for train/test are identical, only train has 1 extra - target. batchsize (int): It is only provided for PyTorch compatibility. So you need to triple check these nans in cats. Not to mention you need to have the same columns everywhere, which can be tricky since if you use fillmissing proc, like I did, it's nice, it would create new cols for cat values (was missing or not). If you train your model with one set of cat features, say color = red, green, blue and your new df has colors: red, green, blue, black - it will throw an error because it won't know what to do with new class (black). The root of the problem was in categorical nans. (tested on kaggle's house prices advanced) For future readers - why can't you get get_preds work for new df?