码迷,mamicode.com
首页 > 其他好文 > 详细

torchtext库(文本预处理库)

时间:2020-04-02 18:23:04      阅读:90      评论:0      收藏:0      [点我收藏+]

标签:for   cto   header   div   Fix   batch   mat   lower   tle   

使用参考:https://zhuanlan.zhihu.com/p/31139113

例程:

def get_data_iter(train_csv, test_csv, fix_length, batch_size, word2vec_dir):
    TEXT = data.Field(sequential=True, lower=True, fix_length=fix_length, batch_first=True)
    LABEL = data.Field(sequential=False, use_vocab=False)
    train_fields = [("label", LABEL), ("title", None), ("text", TEXT)]
    train = TabularDataset(path=train_csv, format=csv, fields=train_fields, skip_header=True)
    train_iter = BucketIterator(train, batch_size=batch_size, device=-1, sort_key=lambda x : len(x.text), sort_within_batch=False, repeat=False)
    test_fields = [("label", LABEL), ("title", None), ("text", TEXT)]
    test = TabularDataset(path=test_csv, format="csv", fields=test_fields, skip_header=True)
    test_iter = Iterator(test, batch_size=batch_size,device=-1, sort=False, sort_within_batch=False, repeat=False)
    #vectors = Vectors(name=word2vec_dir)
    #TEXT.build_vocab(train, vectors=vectors)
    TEXT.build_vocab(train, vectors=GloVe(name=6B, dim=300))
    vocab = TEXT.vocab
    return train_iter, test_iter, vocab

 

torchtext库(文本预处理库)

标签:for   cto   header   div   Fix   batch   mat   lower   tle   

原文地址:https://www.cnblogs.com/zf-blog/p/12621007.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!