LLAP & CGroups: a marriage made in heaven

Hive LLAP (for Live Long and Process), also called Interactive Query on HDInsight, is a service whose promise is to provide performance below the second for queries on very large tables. To achieve interactive performance levels, LLAP relies on Hadoop by using the Tez execution engine and by adding LLAP daemons to cache data, manage JIT optimization, and eliminate most of the startup costs. Caching, pre-fetching, some query processing and access control are moved into the daemons. Small queries are mainly processed by these daemons directly, while any heavy request will be performed via the “classic” Hive, in standard YARN containers.

LLAP uses Apache Slider in order to create the long-running daemons with Yarn. Like your other Hadoop components, sizing matters, and the setup can be tricky. Hortonworks wrote some well-written articles about sizing and setup on their community forum.

On LLAP, like any other applications on Yarn, daemons are restricted on cpu consumption and memory. For LLAP, it used 2 properties:

  • hive.llap.daemon.yarn.container.mb
  • hive.llap.daemon.vcpus.per.instance

Problem is: when you run LLAP (tested on HDP2.6.3, but the problem seems to be present on the HDP3.0 too), only the memory restrictions are rightly configured, but the number of allocated vcpus is always at 1. This is not a problem if cgroups and CPU isolation are not activated, since only the memory will be limited. But if cgroups is activated, and it should, you will end with huge daemons running in heavy containers with more than 64GB of RAM, and with only one CPU core…

For instance, on a 3 daemons LLAP configuration, only 4 CPU vcores are allocated (1 for the ApplicationMaster and 1 for each daemon), and nearly 60GB of RAM per daemon:

So, why?

LLAP is just a Slider/Yarn application. Log into one of your HiveServer2 Interactive instances, look the resources.json used for LLAP under /usr/hdp/XXX/hive2/tmp/llap-XXXX, you should see a JSON file containing:

    "LLAP": {
      "yarn.role.priority": "1",
      "yarn.component.instances": "3",
      "yarn.resource.normalization.enabled": "false",
      "yarn.memory": "..."
      ...
    }

yarn.memory is here, but not yarn.vcores. The source of the problem is located on the usage of hive.llap.daemon.vcpus.per.instance property: in the source code of LLAP server (https://github.com/apache/hive/blob/branch-2.1/llap-server/src/main/resources/package.py#L18) we can see a dict vars containing all the parameters for Yarn, like the direct memory allocation or the Yarn queue name. So, ok, fine. But when we look into https://github.com/apache/hive/blob/branch-3.0/llap-server/src/main/resources/templates.py#L118 we found yarn.memory, but not yarn.vcores… Let’s fix that!

These scripts are located under /usr/hdp/XXX/hive2/scripts/llap/slider/templates.py. Just edit the template.py and replace on the hosts holding the HiveServer2 Interactive instances:

    "LLAP": {
      "yarn.role.priority": "1",
      "yarn.component.instances": "%(instances)d",
      "yarn.resource.normalization.enabled": "false",
      "yarn.memory": "%(container.mb)d",
      "yarn.component.placement.policy" : "%(placement)d"
    }

by:

  "LLAP": {
      "yarn.role.priority": "1",
      "yarn.component.instances": "%(instances)d",
      "yarn.resource.normalization.enabled": "false",
      "yarn.memory": "%(container.mb)d",
      "yarn.vcores": "%(container.vcores)d",
      "yarn.component.placement.policy" : "%(placement)d"
    }

Restart Hive, and enjoy your newly unleashed power!

Sources:
– https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html
– https://community.hortonworks.com/articles/215868/hive-llap-deep-dive.html
– https://github.com/apache/hive/blob/branch-2.1/llap-server

Credits:
“abstract art blur board” by pixabay is licensed under CC0 1.0 / Resized

Related Posts

Leave a comment

About privacy:

This site uses Akismet to reduce spam. Learn how your comment data is processed.