码迷,mamicode.com
首页 > 其他好文 > 详细

用户行为分析

时间:2019-11-19 01:18:17      阅读:65      评论:0      收藏:0      [点我收藏+]

标签:taf   自制   lin   render   show   date类   价值   方法   sid   

 

1、数据集,某宝数据。

下载数据后,利用kettle将文本文件导入mysql数据库。数据time从 2014-11-18 00到2014-12-18 23。包含如下字段:
技术图片

 

 

 item_id(产品ID),behavior_type(包含点击、收藏、加购物车、购买四种行为,分别用数字1、2、3、4表示),user_geohash(地理位置),item_category(品类ID),time(发生行为的时间)。

 2、目标

  • 整体用户购物情况
  • 用户行为转化漏斗
  • 购买率高和购买率为 0 的人群有什么特征
  • 基于时间维度了解用户的行为习惯
  • 基于RFM模型的用户分析

3、数据清洗处理

1)缺失值

user_geohash列含大量NULL值,后续不对该字段处理

2)数据一致化处理

time字段含年月日和小时,将小时单独分解为1列。time列之前新建1列time的副本date 存年月日,time列存时间。方法用到replace和substring_index(按特点分隔符,分隔后索要截取字符串)

  alter table exc1 add date varchar(20) not null after item_category;

  update exc1 set date = time

  update exc1 set date = replace(date,date,substring_index(date,‘ ‘,1))

  update exc1 set time = replace(time, time, substring_index(time, ‘ ‘, -1 ))

3)behavior_type列4个值(点击1、收藏2、加购物车3、购买4)转换为‘pv’,‘fav‘,‘cart‘,‘buy‘

  update exc1 set  behavior_type = replace(behavior_type ,1,‘pv‘);

  update exc1 set  behavior_type = replace(behavior_type ,2,‘fav‘)

  update exc1 set  behavior_type = replace(behavior_type ,3,‘cart‘)

  update exc1 set  behavior_type = replace(behavior_type ,4,‘buy‘) 

 

4) 查看表结构,发现date类型不是date型,将其转换为date型

技术图片

 

 

 alter table exc1 modify date date;

4、构建模型及分析

1)整体用户购物情况

pv总访问量

select  count(behavior_type) as ‘总访问量‘  from exc1 group by behavior_type having behavior_type = ‘pv‘;

  技术图片

日均访问量

select date, count(behavior_type) as ‘日均访问量‘ from exc1 where behavior_type = ‘pv‘ group by date order by date ;

  技术图片

UV(用户总数)

select count(distinct user_id) ‘用户总数‘ from exc1;

  技术图片

 

 

 有购买行为的用户数量

 select count(distinct user_id) ‘购买用户数量‘ from exc1 where behavior_type = ‘buy‘;

  技术图片

 

 

 用户的购物情况

create view user_behavior as 
select user_id, count(behavior_type),
sum(case when behavior_type = ‘pv‘ then 1 else 0 end)  as ‘点击次数‘,
sum(case when behavior_type = ‘fav‘ then 1 else 0 end)  as ‘收藏次数‘,
sum(case when behavior_type = ‘cart‘ then 1 else 0 end)  as ‘加购数‘,
sum(case when behavior_type = ‘buy‘ then 1 else 0 end)  as ‘购买次数‘
from exc1 
group by user_id
order by count(behavior_type) desc;

  

  技术图片

 

 

 复购率

select concat(round(sum(case when 购买次数>=2 then 1 else 0 end)/sum(case when 购买次数>0 then 1 else 0 end)*100), ‘%‘) as ‘复购率‘
from user_behavior

技术图片

2)用户购买行为漏斗

用户购物行为总计

select sum(点击次数) ‘总点击数‘, sum(收藏次数) ‘收藏总数‘,sum(加购数) ‘加购物车总数‘, sum(购买次数) ‘购买总数‘  from user_behavior;

  技术图片

用户购买行为转换率

select CONCAT( round(sum(点击次数)*100/sum(点击次数),2),‘%‘) as ‘pv‘,
CONCAT( round((sum(收藏次数)+sum(加购数))*100/sum(点击次数),2),‘%‘) as ‘pv_to_favcart‘,
CONCAT( round(sum(购买次数)*100/sum(点击次数),2),‘%‘) as ‘pv_to_buy‘
from user_behavior;

 技术图片

制作动态漏斗图:

import pandas as pd 
import pyecharts as pec 

dict = {‘pv‘:100, ‘pv_to_favcart‘:5.07,‘pv_to_buy‘:1.04} 
user = pd.DataFrame(data = dict,index= range(1))

attr = [‘点击‘,‘收藏或加购物车‘,‘购买‘]
value = [user.ix[:,‘pv‘],user.ix[:,‘pv_to_favcart‘],user.ix[:,‘pv_to_buy‘]]
funnel = pec.Funnel(‘用户行为漏斗图‘,width=800,height = 600,title_pos = ‘left‘)
funnel.add(name = ‘用户行为情况‘ ,
		   attr = attr, 
		   value = value,
		   is_label_show = True,
		   label_formatter =‘{b}{c}%‘,
		   label_pos = ‘outside‘,
		   is_legend_show = True
		   )

funnel.render()

  

 技术图片

 

 

 可以看出用户点击后收藏和加购物车的概率在5.04%左右,真正购买的只有1.04%,购买转化率比较低,说明后期还有很大的空间。

3)、购买率高和购买率为0的人群特征分析

购买率高的人群特征(购买率降序排序):

select user_id, 点击次数,收藏次数,加购数,购买次数,
round(购买次数/点击次数*100,2) as 购买率
from user_behavior
group by user_id
order by 购买率 desc ;

  

  技术图片

 

 

 

 

按购买次数进行排序(购买次数降序排序)

 

select user_id, 点击次数,收藏次数,加购数,购买次数,
concat(round(购买次数/点击次数*100,2),‘%‘) as 购买率
from user_behavior
group by user_id
order by 购买次数 desc;

  技术图片

 

 可以发现购买率高点击次数并不高,一部分人点击2次就购买了,这部分人没有收藏也没有加入购物车,属于有目的的购物群体,缺啥买啥,一般属于理智型购物群体。

购买率为0的人群:

select user_id, 点击次数,收藏次数,加购数,购买次数,
round(购买次数/点击次数*100,2) as 购买率
from user_behavior
group by user_id
order by 购买率 asc

  技术图片

 

点击次数多,加购物车或收藏较多考虑有可能是为商家活动做准备。 

点击次数多,购买率低或为0的客户为克制型客户,此类客户爱比较,思虑多,自制性比较强。或者说不会支付。

4)基于时间维度了解用户行为习惯

一天中用户活跃时段分布

select time, count(behavior_type),
sum(case when behavior_type = ‘pv‘ then 1 else 0 end) as ‘点击次数‘,
sum(case when behavior_type = ‘fav‘ then 1 else 0 end) as ‘收藏次数‘,
sum(case when behavior_type = ‘cart‘ then 1 else 0 end) as ‘加购数‘,
sum(case when behavior_type = ‘buy‘ then 1 else 0 end) as ‘购买次数‘
from exc1
group by time 
order by time 

  技术图片

mysql数据导出为csv表:

select time, count(behavior_type),
sum(case when behavior_type = ‘pv‘ then 1 else 0 end) as ‘点击次数‘,
sum(case when behavior_type = ‘fav‘ then 1 else 0 end) as ‘收藏次数‘,
sum(case when behavior_type = ‘cart‘ then 1 else 0 end) as ‘加购数‘,
sum(case when behavior_type = ‘buy‘ then 1 else 0 end) as ‘购买次数‘
from exc1
group by time 
order by time
into outfile ‘E:/Pro/users.csv‘ fields terminated by ‘,‘enclosed by ‘"‘lines terminated by ‘\r\n‘;

  

 

 

 

一周中用户活跃分布

select date_format(date,‘%W‘) as weeks, count(behavior_type),
sum(case when behavior_type = ‘pv‘ then 1 else 0 end) as ‘点击次数‘,
sum(case when behavior_type = ‘fav‘ then 1 else 0 end) as ‘收藏次数‘,
sum(case when behavior_type = ‘cart‘ then 1 else 0 end) as ‘加购数‘,
sum(case when behavior_type = ‘buy‘ then 1 else 0 end) as ‘购买次数‘
from exc1
group by weeks
order by weeks 

  技术图片

 

 5) 基于RFM模型找出有价值的客户

R-Recency: 最近一次购买时间
F-Frequency: 消费频率
M-Money:消费金额

数据集没有消费金额,对最近一次购买时间和消费频率进行评分

  • 针对R-Recency评分(间隔天数越少客户价值越大,间隔天数升序排序)
select a.* , 
(@r:=@r+1) as recent_rank from (
select user_id, datediff(‘2014-12-19‘, max(date)) as recent from exc1 
where behavior_type = ‘buy‘ group by user_id order by recent )a,(select @r:=0)b ;

 技术图片

  •  针对消费频率(F-Frequency)(购买频率越大,客户价值越大)
select a.* , 
(@r2:=@r2+1) as freq_rank from (
select user_id, count(behavior_type) as frequency from exc1 
where behavior_type = ‘buy‘
group by user_id
order by frequency desc 
)a,(select @r2:=0)b 

  技术图片

  •  联合合并2者,并加入评分关系
select m.user_id , n.frequency , recent_rank, freq_rank,
concat(
case when recent_rank <= (4330)/4 then ‘4‘
when recent_rank > (4330)/4 and recent_rank <= (4330)/2 then ‘3‘
when recent_rank > (4330)/2 and recent_rank <= (4330)/4*3 then ‘2‘
else ‘1‘ end ,
case when freq_rank <= (4330)/4 then ‘4‘
when freq_rank > (4330)/4 and freq_rank <= (4330)/2 then ‘3‘
when freq_rank > (4330)/2 and freq_rank <= (4330)/4*3 then ‘2‘
else ‘1‘ end
) as user_value
from (
select a.*,(@r1:=@r1+1) as recent_rank from (
select user_id, datediff(‘2014-12-19‘,max(date)) as recent 
from exc1  
where behavior_type = ‘buy‘ 
group by user_id  order by recent
) a, (select @r1:=0 )as b) m ,
( select a.* , (@r2:=@r2+1) as freq_rank from (
select user_id, count(behavior_type) as frequency 
from exc1
where behavior_type = ‘buy‘
group by user_id order by frequency desc
)a ,(select @r2:=0) as b) as n
where m.user_id = n.user_id;

  技术图片

 

 通过得分可以看出user_value 为‘41’的用户为关注频次比较高,购买能力不足的用户,可以选择适当促销打折或其他捆绑销售来促进客户下单

而user_value 为‘14’的用户这类关注度忠诚度不高,购买力强的客户需要关注其购物习性做精准化营销。

 

用户行为分析

标签:taf   自制   lin   render   show   date类   价值   方法   sid   

原文地址:https://www.cnblogs.com/hqczsh/p/11878859.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!