Quite often, our customers are faced with the necessity to download information about calls from the Call Detail Record section to their systems. As you already know, all historical statistics are stored in the Elasticsearch database. In the documentation, you could also see an example of receiving data using our REST API. But, what if there are several hundreds or thousands of calls? How to get this amount of data? This article will answer this question.


Scroll

To get a lot of data, Elasticsearch provides the scroll function, which we repeated in our REST API. An example:


webitel_scroll_request.json
{
    "scroll" : "5m",
    "limit": 1000,
    "sort": {
        "created_time": {
            "order": "desc",
            "unmapped_type": "boolean"
        }
    },
    "index": "cdr-a",
    "query": "*",
    "columns": [
        "created_time",
        "uuid",
        "direction",
        "duration"
    ],
    "filter": [
        {
            "bool": {
                "must": [
                    {
                        "range": {
                            "created_time": {
                                "gte": "now/w",
                                "lte": "now"
                            }
                        }
                    }
                ]
            }
        }
    ]
}

We have added 2 new parameters to the body of our request:

  • scroll - how long to keep the result of the request on the server
  • limit - which parts to return the query result

Next, we execute the first request with the specified body on the REST API, for simplicity I use the cURL console utility:

curl -s -L -XPOST \
    -H 'Content-Type: application/json' \
    -H 'X-Access-Token: ciOiJIUzI1NiJ9.jEyM2UxNThjLWVkNzMtNDAwi'\
    "https://pre.webitel.com/engine/api/v2/cdr/text" -d@webitel_scroll_request.json

Together with a result that will not exceed the value specified in limit, we get _scroll_id:

Now we execute all subsequent requests already with scrollId in the request body. Example:

curl -s -L -XPOST \
        -H 'Content-Type: application/json' \
        -H 'X-Access-Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpZCI6I9.IjKpitL05OLjUPeUQyd4E'\
        "https://pre.webitel.com/engine/api/v2/cdr/text/scroll" -d '
        {
            "scroll": "5m",
            "scrollId": 'received scrollId '
        }'

We repeat the request until we collect all the data from the server.

Bonus

As a bonus, we have prepared a small bash script, which is using cURL and jq will help you download the necessary data and save it in a CSV file:

#!/bin/bash
#
rm cdr.csv

tmpfile=$(mktemp /tmp/scroll.XXXXXX)
scroll_id=$(curl -s -L -XPOST \
    -H 'Content-Type: application/json' \
    -H 'X-Access-Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpZCI6IjEyM2UxNThjLWVkNzMtNDAzOC1hOWExLTA5Y2MxZjk4ZDJmYSIsImV4cCI6MTU0OTQ5MDQwMDAwMCwiZCI6IndlYml0ZWwuZHJydXBpYWguY29tIiwidCI6ImRvbWFpbiIsInYiOjJ9.IjK6q1ra6um1ZJ0_gJImkNcZUpitL05OLjUPeUQyd4E'\
    "https://pre.webitel.com/engine/api/v2/cdr/text" -d@webitel_scroll_request.json | tee >(jq -r '.hits.hits[].fields | [.created_time[], .uuid[], .direction[], .duration[]] | @csv' >>cdr.csv) >(jq '.hits.hits | length' >${tmpfile}) | jq ._scroll_id)
size=$(cat ${tmpfile})
total=$size

while [ $size -ge 1000 ]
do
    size=$(curl -s -L -XPOST \
        -H 'Content-Type: application/json' \
        -H 'X-Access-Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpZCI6IjEyM2UxNThjLWVkNzMtNDAzOC1hOWExLTA5Y2MxZjk4ZDJmYSIsImV4cCI6MTU0OTQ5MDQwMDAwMCwiZCI6IndlYml0ZWwuZHJydXBpYWguY29tIiwidCI6ImRvbWFpbiIsInYiOjJ9.IjK6q1ra6um1ZJ0_gJImkNcZUpitL05OLjUPeUQyd4E'\
        "https://pre.webitel.com/engine/api/v2/cdr/text/scroll" -d '
        {
            "scroll": "5m",
            "scrollId": '${scroll_id}'
        }' | tee >(jq -r '.hits.hits[].fields | [.created_time[], .uuid[], .direction[], .duration[]] | @csv' >>cdr.csv) | jq '.hits.hits | length')
    echo $size
    total=$(( $total + $size ))
done

echo "$total - done"

rm $tmpfile
exit 0

The request body must be in a file in webitel_scroll_request.json near this script.


Good luck with your requests!

  • No labels