标签:oca 10个 term .data com erro http self pass
 1. 定义矩形scheme ret 得到一个bach_sizes数组 
  {‘min_length‘: 8, ‘window_size‘: 720, 
  ‘shuffle_queue_size‘: 270, 
  ‘boundaries‘: [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 33, 36, 39, 42, 46, 50, 55, 60, 66, 72, 79, 86, 94, 103, 113, 124, 136, 149, 163, 179, 196, 215, 236],
  ‘max_length‘: 256, 
  ‘batch_sizes‘: [240, 180, 180, 180, 144, 144, 144, 120, 120, 120, 90, 90, 90, 90, 80, 72, 72, 60, 60, 48, 48, 48, 40, 40, 36, 30, 30, 24, 24, 20, 20, 18, 18, 16, 15, 12, 12, 10, 10, 9, 8, 8]}
2.input_pipline 读取文件 10个文件 decode_record
  组合成字典形式的数据集 dataset {"src_id":  "target_id":}
  (1)过滤长度:#根据源端和目标端句子长度最大的过滤
           length = _example_length(example)
           return tf.logical_and(length >= min_length, length <= max_length)
		   dataset = dataset.filter(functools.partial(example_valid_size, min_length = batching_scheme["min_length"], max_length = batching_scheme["max_length"]))
		   filter会作用于每一个dataset
  (2)根据长度选择篮子编号:传入dataset {"src_id":  "target_id":} 以及bundaries{} 遍历句子的长度,进行比较
                        conditions_c = tf.logical_and(tf.less_equal(buckets_min, seq_length), tf.less(seq_length, buckets_max))
                        返回 budaries所在的位置
						根据上次返回的id,找到篮子的位置,并找到窗口的大小。其窗口的定义,用英文的解释比较好理解:我所理解的就是,比如一个能放mg的篮子
						window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size_func.
						tf.contrib.data.group_by_window(
						
												key_func,
												reduce_func,
												window_size=None,
												window_size_func=None
						)
						Defined in tensorflow/contrib/data/python/ops/grouping.py.
A transformation that groups windows of elements by key and reduces them.
This transformation maps each consecutive element in a dataset to a key using key_func and groups the elements by key. It then applies reduce_func to at most window_size_func(key) elements matching the same key. All except the final window for each key will contain window_size_func(key) elements; the final window may be smaller.
You may provide either a constant window_size or a window size determined by the key through window_size_func.
						Args:
						key_func: A function mapping a nested structure of tensors (having shapes and types defined by self.output_shapes and self.output_types) to a scalar tf.int64 tensor.
						reduce_func: A function mapping a key and a dataset of up to window_size consecutive elements matching that key to another dataset.
						window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size_func.
						window_size_func: A function mapping a key to a tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size.
						Returns:
						A Dataset transformation function, which can be passed to tf.data.Dataset.apply.
						Raises:
						ValueError: if neither or both of {window_size, window_size_func} are passed.
	(3)进行 pad grouped_dataset.padded_batch(batch_size, padded_shapes) ----group_dataset是什么   batch_size 为句子的个数 padded_shapes 要pad的维度
	整合 ,将id序列编程矩阵  dataset.apply(tf.contrib.data.group_by_window(example_to_bucket_id, batching_fn, None,  )
二:
	一维卷积:https://blog.csdn.net/appleyuchi/article/details/78597054
	tf.reshape:https://blog.csdn.net/lxg0807/article/details/53021859
	list和tublehttps://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014316724772904521142196b74a3f8abf93d8e97c6ee6000
	expend_dims :https://blog.csdn.net/qq_31780525/article/details/72280284
	tf.concat 以及tf.split: https://blog.csdn.net/momaojia/article/details/77603322  https://blog.csdn.net/UESTC_C2_403/article/details/73350457
	feedforward:一维卷积网络设计,然后两层卷积之间加了relu非线性操作。之后是residual操作加上inputs残差,然后是normalize--->不直接用layers.dense直接进行全连接
	label_smothing:
	(1)normalization:  normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
                    outputs = gamma * normalized + beta   获取均值和方差: 
	             ‘‘‘Applies layer normalization.
						 Args:
						  inputs: A tensor with 2 or more dimensions, where the first dimension has
							`batch_size`.
						  epsilon: A floating number. A very small number for preventing ZeroDivision Error.
						  scope: Optional scope for `variable_scope`.
						  reuse: Boolean, whether to reuse the weights of a previous layer
							by the same name.
						Returns:
						  A tensor with the same shape and data dtype as `inputs`.
						  
				  ‘‘‘
                 beta,和gamma没有做什么?
	 (2)embedding: 其用到了一个tensorflow中一个embedding 方法使输入的张量分布的更均匀,词与词之间存在着某种关系
	                 并且比输入的多一个维度,最后一维为神经元的个数
	                scale参数对outputs根据num_units的大小进行了scale,当scale为True时执行scale,默认为True???????
					  ‘‘‘Embeds a given tensor.
						Args:
						  inputs: A `Tensor` with type `int32` or `int64` containing the ids
							 to be looked up in `lookup table`.
						  vocab_size: An int. Vocabulary size.
						  num_units: An int. Number of embedding hidden units.
						  zero_pad: A boolean. If True, all the values of the fist row (id 0)
							should be constant zeros.
						  scale: A boolean. If True. the outputs is multiplied by sqrt num_units.
						  scope: Optional scope for `variable_scope`.
						  reuse: Boolean, whether to reuse the weights of a previous layer
							by the same name.
						Returns:
						  A `Tensor` with one more rank than inputs‘s. The last dimensionality
							should be `num_units`.
                        其中有用到一个函数:   其作用相当于,中文---英文 之间的对应  一个博客里讲的很靠谱吧,就是输入一个inputs_tensor 当作字典,
						                       然后给出要表示的ids,最后给出tensor
	                                           其链接:https://www.jianshu.com/p/677e71364c8e 其用到one-hot编码https://blog.csdn.net/pipisorry/article/details/61193868
	   (3)multi-head attention;
	                    a. QKV的全连接 dense:全连接层,其最后一维变为num_units,
						   且 outputs = activation(inputs * kernel + bias) 
	                    b.mask 的操作,利用reduce_sum找出为0 的,进行mask,通过将attention_score设置为最小值,标记其位置
(4)dropout: 
        (5)label_smothing:做平滑操作
        (6)位置编码:	有点问题	
	最近一直在看这个,但是还是有很多的问题。。
	  
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
标签:oca 10个 term .data com erro http self pass
原文地址:https://www.cnblogs.com/Shaylin/p/9918178.html