- How to save big query cost2020년 09월 29일 14시 27분 14초에 업로드 된 글입니다.작성자: DE 군고구마반응형
Hello.
Many people seem to know the part of optimizing cost and performance by fetching data without reading unnecessary parts in big queries using partition columns.There are many advantages to using partitioned columns as above, but there may be cases where you cannot.
If there are several date columns in the table, the case where the date column to be used is not partitioned is as follows. Big query does not provide multiple indexes (partition columns) like regular RDBs. Therefore, we need to improve the performance through one partition column. In this case, a very difficult situation arises.
If you get a value into the date column as the basis, the data doesn't fit, but if you have to write it because of performance or cost, it doesn't make sense because you have to write strange data as it is. Therefore, if you retrieve data through the following methods, you can capture both performance and cost and get accurate data.
This method is quick to understand if you look through the picture as follows.TODAY column: 2019-03-01, 2019-03-02… Load together in date format
LOGTIME column (partition: 2019-03-01 12:30:20 loaded in datetime format
What should I do if I need to aggregate and show the number of cases by date from 2020-02-25 to 2020-03-03 by referring to the above?
In general, you can create a query like this:1234567select TODAY,count(*)from test_tablewhere TODAY > '2020-02-25' and TODAY < '2020-03-04'group by TODAYorder by TODAY ascs If you do as above, the TODAY column is not a partition column, so it will not have a good effect on performance.
You can see that the cost is high by looking at the estimated cost before executing the query in Big Query like this:However, if you change the query by using the partitioned column as follows, you can see that the cost drops tremendously.
12345select TODAY,count(*)from test_tablewhere (LOGTIME > '2020-02-24' and LOGTIME < '2020-03-05') and (TODAY > '2020-02-25' and TODAY < '2020-03-04')group by TODAYorder by TODAY asccs Looking at the cost of this query:
The cost has been reduced tremendously. Will the results be the same?
Of course, the result is exactly the same. As you can see above, we take LOGTIME back and forth +- 1 day more (in case you missed it) and filter it back to TODAY. This allows you to get completely accurate data while lowering your costs by significantly reducing the range.
This is a very extreme scenario that I made, but there are several scenarios, and you can see a real example of using it in the post I wrote earlier. (Go to Example)
This has helped me a lot by reducing search costs by over 100 times. In particular, in the case of big queries, this seems to be very helpful because small companies often use a daily quota limit.
Thank you.반응형'GCP > 운영관련 개발' 카테고리의 다른 글
[Cloud Build] GCS <-> github 동기화 실패 (failed: step exited with non-zero status: 1) (0) 2022.12.06 GCP - all open 방화벽 체크 (0) 2021.01.04 GCP 방화벽(firewall) 설정 (0) 2020.12.31 GCP-stackdriver python api 사용하기(list_time_series) (0) 2020.03.27 다음글이 없습니다.이전글이 없습니다.댓글