当前在线人数10375
首页 - 分类讨论区 - 电脑网络 - 数据库版 - 同主题阅读文章

此篇文章共收到打赏
0

  • 10
  • 20
  • 50
  • 100
您目前伪币余额:0
未名交友
[更多]
[更多]
问个database design的问题
[版面:数据库][首篇作者:Fine] , 2007年06月14日23:29:44 ,1052次阅读,10次回复
来APP回复,赚取更多伪币 关注本站公众号:
[分页:1 ]
Fine
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 1 ]

发信人: Fine (coulda shoulda woulda), 信区: Database
标  题: 问个database design的问题
发信站: BBS 未名空间站 (Thu Jun 14 23:29:44 2007)

用的是MySQL. 要用来记录很多user的timestamp based的数据。做query的时候都是针
对单独的一个user. 每个记录象这样:

user_i timestamp1 value1
user_j timestamp2 value2
user_i timestamp3 value3
user_i timestamp3 value4

有三种选择,第一种就是用一个table来放所有数据,象上面一样。这样的坏处就是每
次要scan很多不相关user的record.

选择二:每个user用一个单独的table, 当然相同的schema.
选择三:每个user用一个单独的database.

二,三的好处就是query会快一点。但挺怪的。DB的高手们指教一下。先谢了。
--

※ 来源:·WWW 未名空间站 海外: mitbbs.com 中国: mitbbs.cn·[FROM: 68.165.]

 
NNYYCC
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 2 ]

发信人: NNYYCC (牙膏皮换糖), 信区: Database
标  题: Re: 问个database design的问题
发信站: BBS 未名空间站 (Fri Jun 15 13:22:49 2007)

首先,你这数据库主要是用来干嘛的?估计得多大?是输入数据的时候多还是做报告的
时候多?server的configurations怎么样?

对于你描述的情况,比较常见的做法是,如果你是用来记录数据为主要目的,也就是所
谓的transactional database, 你首先要normalize, 也就是说,把user data放在一个
单独的table中,然后把加一个transactional table就够了。不管怎么样,你上面列的
3种方法都不专业,不可取。

举个简单的例子,你可以有两个table: Employee, Attandence

Employee (Emp_ID, Last_Name, First_Name, Emp_Status, Last_Updated)

Attandence(Emp_ID,Time_Stamp, Value, Date,...)

因为不了解你的具体要求,只能提这么点简单的建议。希望有点用。

--
※ 修改:·NNYYCC 於 Jun 15 13:24:27 2007 修改本文·[FROM: 156.145.]

 
Fine
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 3 ]

发信人: Fine (coulda shoulda woulda), 信区: Database
标  题: Re: 问个database design的问题
发信站: BBS 未名空间站 (Fri Jun 15 14:11:32 2007)

非常感谢你的回复.

不专业是肯定的啦.呵呵.

举个例子吧:用DB来存股票价格历史. 每个记录包括股票代号,时间,价格:

SYMB1 timestamp1 price1
SYMB2 timestamp2 price2
SYMB3 timestamp3 price3
SYMB1 timestamp3 price4
...

imagine 1000 symbols, each symbol(stock) has 1million records.

做报告的时候比输入数据多.但每次输入tends to be in bulk volume.

这样,如果把所有记录放在一个table中,要严究一个股票(symbol),就要scan
1000million records.如果每个股票一个table,  那就只要scan 1million records.

谢谢谢谢

【 在 NNYYCC (牙膏皮换糖) 的大作中提到: 】
: 首先,你这数据库主要是用来干嘛的?估计得多大?是输入数据的时候多还是做报告的
: 时候多?server的configurations怎么样?
: 对于你描述的情况,比较常见的做法是,如果你是用来记录数据为主要目的,也就是所
: 谓的transactional database, 你首先要normalize, 也就是说,把user data放在一个
: 单独的table中,然后把加一个transactional table就够了。不管怎么样,你上面列的
: 3种方法都不专业,不可取。
: 举个简单的例子,你可以有两个table: Employee, Attandence
: Employee (Emp_ID, Last_Name, First_Name, Emp_Status, Last_Updated)
: Attandence(Emp_ID,Time_Stamp, Value, Date,...)
: 因为不了解你的具体要求,只能提这么点简单的建议。希望有点用。



--

※ 来源:·WWW 未名空间站 海外: mitbbs.com 中国: mitbbs.cn·[FROM: 66.166.]

 
NNYYCC
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 4 ]

发信人: NNYYCC (牙膏皮换糖), 信区: Database
标  题: Re: Re: 问个database design的问题
发信站: BBS 未名空间站 (Fri Jun 15 14:33:28 2007)

You still didn't answer my question in the first paragraph. So I assume that
your database is a transactional database and would make suggestions based
on this assumption.

Based on your additional info, you may want to add at least another table,
let's call it STOCK.

STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)

With another two tables shown below:

TRADER(TRADER_ID, LAST_NAME, FIRST_NAME, STATUS, LAST_UPDATED)

STOCK_TRADE(TRADER_ID, STOCK_ID, TRADE_TIME, TRADE_VOLUME, TRADE_PRICE, LAST
_UPDATED)

Several assumptions:

1. The trade price for each stock varies for each trade.

2. You don't have any other tool to log your data change. That's why I added
the column "Last_Updated" to each table. But it is your decision whether or
not you want to have this column.

For the transaction table STOCK_TRADE, you might want to create indecies on
the trader_id and stock_id columns and partition your table based on either
trade_time or last_updated column. This will significantly improve your
performance.

Finally, if you use the database for reporting purpose, it will be another
story. You might not want to normalize your whole database this way. A
denormalized data warehouse would be a better option.

Good luck!

--

※ 来源:·WWW 未名空间站 海外: mitbbs.com 中国: mitbbs.cn·[FROM: 156.145.]

 
wildhorse
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 5 ]

发信人: wildhorse (黑马 -- 我是珠丝), 信区: Database
标  题: Re: Re: 问个database design的问题
发信站: BBS 未名空间站 (Fri Jun 15 15:37:15 2007), 转信

zan!说得挺清楚的
【 在 NNYYCC (牙膏皮换糖) 的大作中提到: 】
: You still didn't answer my question in the first paragraph. So I assume
that
:  your database is a transactional database and would make suggestions
based
: on this assumption.
: Based on your additional info, you may want to add at least another table,
: let's call it STOCK.
: STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)
: With another two tables shown below:
: TRADER(TRADER_ID, LAST_NAME, FIRST_NAME, STATUS, LAST_UPDATED)
: STOCK_TRADE(TRADER_ID, STOCK_ID, TRADE_TIME, TRADE_VOLUME, TRADE_PRICE,
LAST
: _UPDATED)
: ...................


--

※ 来源:·BBS 未名空间站 海外: mitbbs.com 中国: mitbbs.cn·[FROM: 207.67.]

 
Fine
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 6 ]

发信人: Fine (coulda shoulda woulda), 信区: Database
标  题: Re: Re: 问个database design的问题
发信站: BBS 未名空间站 (Fri Jun 15 17:32:11 2007)

Really appreciate it!

我没有糖,但你也不用用牙膏皮来换了,送你5个包子略表谢意.呵呵.

OK. back to the question.

Here, assume I'm doing research on stock price history. I have tick data for
1000 stocks, each with 1 million ticks, in separate text(.csv) files, one
file per stock.

I would like to import the data (csv files) into database and run queries on
the DB, and I only run query on one stock at a time. This is not really a
transactional database, if I understand what it is -- the import part
happens very infrequently, while I need to do a lot of queries on each
individual stock.

I could have a table like you suggested:

STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)

(although I don't really care about the desc of the stock, all I care is the
symb)

another table:

PRICE_HISTORY(STOCK_ID, TIMESTAMP, PRICE)

so again, back to my early question, if all price history is in one table
like the above, I have to scan 1000million records just to get the full
history of one stock.

since there are lots of repetition of the STOCK_ID in the table, indexing on
STOCK_ID is probably not efficient.

On the other hand, if I have a table for each stock:

PRICE_HISTORY_SYMBO_i(TIMESTAMP, PRICE)

I just need to scan the table I want and it should be a lot more faster than
the first approach. But then I'll have 1000 tables with the same schema.

What do you think?

【 在 NNYYCC (牙膏皮换糖) 的大作中提到: 】
: You still didn't answer my question in the first paragraph. So I assume
that
:  your database is a transactional database and would make suggestions
based
: on this assumption.
: Based on your additional info, you may want to add at least another table,
: let's call it STOCK.
: STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)
: With another two tables shown below:
: TRADER(TRADER_ID, LAST_NAME, FIRST_NAME, STATUS, LAST_UPDATED)
: STOCK_TRADE(TRADER_ID, STOCK_ID, TRADE_TIME, TRADE_VOLUME, TRADE_PRICE,
LAST
: _UPDATED)
: ...................



--

※ 来源:·WWW 未名空间站 海外: mitbbs.com 中国: mitbbs.cn·[FROM: 66.166.]

 
xlxie
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 7 ]

发信人: xlxie (三老爷), 信区: Database
标  题: Re: Re: 问个database design的问题
发信站: BBS 未名空间站 (Fri Jun 15 23:43:10 2007)

one additional recommendation.
if you have 1 billion records for short period, you might have to do table
partitioning, depends on your DB.


【 在 NNYYCC (牙膏皮换糖) 的大作中提到: 】
: You still didn't answer my question in the first paragraph. So I assume
that
:  your database is a transactional database and would make suggestions
based
: on this assumption.
: Based on your additional info, you may want to add at least another table,
: let's call it STOCK.
: STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)
: With another two tables shown below:
: TRADER(TRADER_ID, LAST_NAME, FIRST_NAME, STATUS, LAST_UPDATED)
: STOCK_TRADE(TRADER_ID, STOCK_ID, TRADE_TIME, TRADE_VOLUME, TRADE_PRICE,
LAST
: _UPDATED)
: ...................



--

※ 来源:·WWW 未名空间站 海外: mitbbs.com 中国: mitbbs.cn·[FROM: 69.138.]

 
Fine
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 8 ]

发信人: Fine (coulda shoulda woulda), 信区: Database
标  题: Re: Re: 问个database design的问题
发信站: BBS 未名空间站 (Sat Jun 16 00:32:02 2007)

thanks. I don't have 1 billion records, just several millions or even less.

【 在 xlxie (三老爷) 的大作中提到: 】
: one additional recommendation.
: if you have 1 billion records for short period, you might have to do table
: partitioning, depends on your DB.
: that
: based
: LAST




--

※ 来源:·WWW 未名空间站 海外: mitbbs.com 中国: mitbbs.cn·[FROM: 68.165.]

 
q101
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 9 ]

发信人: q101 (^(?:[DX]?HT|X|YA)ML$), 信区: Database
标  题: Re: 问个database design的问题
发信站: BBS 未名空间站 (Sat Jun 16 22:04:06 2007), 转信

【 在 Fine (coulda shoulda woulda) 的大作中提到: 】
: 用的是MySQL. 要用来记录很多user的timestamp based的数据。做query的时候都是针
: 对单独的一个user. 每个记录象这样:
: user_i timestamp1 value1
: user_j timestamp2 value2
: user_i timestamp3 value3
: user_i timestamp3 value4
: 有三种选择,第一种就是用一个table来放所有数据,象上面一样。这样的坏处就是每
: 次要scan很多不相关user的record.

我看你应该先看看这些table columns有没有normalization的空间再决定
怎么分表,我估计你这情况用 star schema database model 最简单实用,
或者根本就一个表就ok。。

: 选择二:每个user用一个单独的table, 当然相同的schema.
: 选择三:每个user用一个单独的database.

后两个一看就不行,如果你有很多用户,怎么解决scability的问题。。


--

※ 来源:·BBS 未名空间站 海外: mitbbs.com 中国: mitbbs.cn·[FROM: 129.49.]

 
fengbright
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 10 ]

发信人: fengbright (Aliang), 信区: Database
标  题: Re: 问个database design的问题
发信站: BBS 未名空间站 (Sun Jun 17 13:00:10 2007)

How about define index on userid?
--

※ 来源:·WWW 未名空间站 海外: mitbbs.com 中国: mitbbs.cn·[FROM: 76.160.]

 
NNYYCC
进入未名形象秀
我的博客
[回复] [回信给作者] [本篇全文] [本讨论区] [修改] [删除] [转寄] [转贴] [收藏] [举报] [ 11 ]

发信人: NNYYCC (牙膏皮换糖), 信区: Database
标  题: Re: Re: 问个database design的问题
发信站: BBS 未名空间站 (Mon Jun 18 11:16:09 2007)

I believe table partition will be a good option for your data. Also, create
an index on stock_id AFTER table is partitioned.1000 stocks for a multi-
million row table is still considered low cardinality and an index will
improve the performance.

【 在 Fine (coulda shoulda woulda) 的大作中提到: 】
: Really appreciate it!
: 我没有糖,但你也不用用牙膏皮来换了,送你5个包子略表谢意.呵呵.
: OK. back to the question.
: Here, assume I'm doing research on stock price history. I have tick data
for
:  1000 stocks, each with 1 million ticks, in separate text(.csv) files, one
: file per stock.
: I would like to import the data (csv files) into database and run queries
on
:  the DB, and I only run query on one stock at a time. This is not really a
: transactional database, if I understand what it is -- the import part
: happens very infrequently, while I need to do a lot of queries on each
: ...................



--

※ 来源:·WWW 未名空间站 海外: mitbbs.com 中国: mitbbs.cn·[FROM: 156.145.]

[分页:1 ]
[快速返回] [ 进入数据库讨论区] [返回顶部]
回复文章
标题:
内 容:

未名交友
将您的链接放在这儿

友情链接


 

Site Map - Contact Us - Terms and Conditions - Privacy Policy

版权所有,未名空间(mitbbs.com),since 1996