发信人: tailang (西瓜太郎), 信区: Database
标 题: Re: question on large tables (>=800 million records, 10 G bytes
发信站: BBS 未名空间站 (Wed Jan 17 22:03:56 2007)
Have you ever considered a non-DB approach?
The data structure looks pretty straightforward. You may find a way (need
some compression) to put all data including a unique key (8 bytes) into a
16GB 64bit machine.
Now think about the reverse index.You need to build index on cabid/timestamp
/longitude/latitude, each index could fit into a single machine.
When queries come in, get sets of unique keys from all indices and find the
Do your final query on the datahost to return result.
I would expect under 10ms of avg execution time. To update/serialize records
is another thing you may want to consider, but it's out of the concern of
【 在 babycry (babycry) 的大作中提到: 】
: Hello, this is a data set for data mining.
: I believe the experiences on this case should be helpful in general.
: The questions is, how to make fast queries on large tables
: (>=800 million records, 10G bytes of data)
: with ordinary machines ?
: Below are some details:
: There is only one table, with the following fields:
: cabId CHAR(8), timestamps DATETIME, longitude FLOAT, latitude FLOAT,
: We want to be able to query on cabId, timestamps, latitude, and longitude.
※ 来源:·BBS 未名空间站 http://mitbbs.com·[FROM: 71.198.]