cassandra 2.0 - CQL SELECT with lower bound -


Let's say I have Cassandra DB and I need to process a larger group data which I can query with a selection. The problem is that the processing is very slow and I want to use the distributed system to work in.

I know that by using marginal capacity of CLL I can get a limited number of rows, but I will need something like LIMIT and OFFSET so that each process can get an independent share of the data. (What is the offset that will eventually be implemented in the CQ? I have read that it would be inefficient, is the reason for this not applicable?)

I would like to avoid waiting for the end to start the next question For, as suggested in this, these procedures are useless while waiting for the previous questions to be completed.


For example, suppose I would like to process the weather data and for the moment, my table is visible (I could use it to store other data types, such as time For timewid, this is just a dummy problem):

  Make Weather Weather_data (station worker, date varchar, time varchar, value double, primary key (station, date), time);   

For a given station and date, I want to make the segment of data (based on time). I think I know how many measures I have for each station and date.

If the correct answer is "Changing the structure of the table", then I would be happy to see how to modify it. / P>

I change my answer because I misunderstood the original problem. What I will do is information about station and date related to other sub-sections, for example, for the day or whatever is the appropriate division for you.

  Create table Weather_data (station verarchar, date varchar , Dayhour int, time varchar, value double, primary key ((station, date), day, time));   

In this way, you can divide your data into 24 parts and allow parallel execution as I said earlier. In this way you can divide only in the first 2 hours for example - The downside is that you will always hit the same nodes, there may be an option to make a primary key:

  Primary key ((station, date, day-night), time)   

This will split your data on a day-to-day basis, if the side effect is given to you from a specific station Needed to get all the measurements from the station You must complete the 24 questions then. The last but least solution can not be denied (arrange the hours to adjust the data in a new table and leave the original).

HH, Carlo

Comments

Popular posts from this blog

java - ImportError: No module named py4j.java_gateway -

python - Receiving "KeyError" after decoding json result from url -

.net - Creating a new Queue Manager and Queue in Websphere MQ (using C#) -