Recovering Lost Terraform State for Accidental Deletion or Corruption

October 5, 2023 by hnminh, posted in DevOps

Introduction

Terraform, a powerful Infrastructure as Code (IaC) tool, has transformed how organizations manage their cloud resources and infrastructure. At its core lies the Terraform state, a critical component that tracks deployed resources and their configurations. Yet, when this state is lost or corrupted, it can pose significant challenges in infrastructure management. In this comprehensive guide, we’ll explore the nuances of the Terraform state, discuss common scenarios leading to its loss or corruption, and provide insights and techniques to recover your Terraform state effectively, ensuring the stability and reliability of your IaC projects.

Understanding Terraform State

1. What is Terraform State?

Terraform state forms the foundation of Terraform’s functionality for managing Infrastructure as Code (IaC). It acts as the single source of truth for the current state of your infrastructure. Essentially, Terraform state records all resources it creates and manages, capturing their current attributes and relationships. This information enables Terraform to make informed decisions regarding resource creation, modification, or deletion during subsequent runs. Think of Terraform state as a map of your infrastructure, facilitating Terraform’s understanding of the existing state and effective orchestration of changes.

2. Where is Terraform State Stored?

Local Backend

Terraform state can be stored using various backends, with the local backend being one common option. In the local backend, the state resides in a local file within your project directory. While suitable for smaller projects or personal use, it has limitations concerning collaboration and scalability. Local state files are susceptible to loss, misplacement, or becoming outdated if not managed meticulously.

Remote Backends

To address the limitations of local storage, Terraform offers remote backends as an alternative. Remote backends store the state file centrally, typically on cloud or remote servers. Popular choices include Amazon S3, Google Cloud Storage, and HashiCorp’s Terraform Cloud. Using remote backends enhances collaboration, integrates with version control, and ensures the security and accessibility of your Terraform state. It’s considered a best practice for managing state in production environments.

Understanding where Terraform state is stored and the differences between local and remote backends is crucial for effective state management and recovery. In the following sections, we’ll delve into scenarios that might lead to Terraform state loss or corruption and strategies to mitigate these issues.

Common Scenarios for Terraform State Loss or Corruption

Despite Terraform’s robustness, there are situations where Terraform’s state can be compromised, either through loss or corruption. Familiarizing yourself with these scenarios is vital for proactive state management and recovery strategies. Here are some common scenarios:

1. Accidental Deletion

Accidentally deleting Terraform state files is a frequent scenario, especially in collaborative projects. It can occur when team members mistakenly remove or overwrite state files. State file deletions result in Terraform losing its infrastructure knowledge, making managing or modifying existing resources challenging. This scenario underscores the importance of preventing unintentional file removal and implementing robust backup procedures.

2. Corruption

Terraform state files can become corrupted for various reasons, including network interruptions, disk errors, or software bugs. Corruption may arise when state files are not adequately saved or synchronized with the remote backend. Corrupted state files are often unreadable by Terraform, rendering them useless. Detecting and recovering from state file corruption is a critical skill to maintain infrastructure stability.

3. Loss of State Configuration

Another scenario leading to state issues is the loss of state configuration itself. This happens when the state backend configuration is lost or misconfigured. Terraform cannot access the state file without the correct configuration, rendering resource management impossible. Documenting and securely storing state backend configurations is vital to prevent disruptions in such cases.

Steps to Recover Terraform State for Infrastructure

Recovering the Terraform state is essential when facing issues like loss or corruption. Depending on the severity of the problem, you may need different steps to restore your infrastructure’s state. Here’s a systematic guide to navigate the recovery process:

1. Identify the Extent of the Issue

Before initiating the recovery process, assess the damage and understand the problem’s scope. Determine which resources and components of your infrastructure are affected by the state issue. Key questions to answer include:

– What specific resources are missing or corrupted in the state?

– Are there any dependencies between resources that need consideration?

– Does the issue impact a single resource or the entire state?

Understanding the issue’s extent guides your recovery efforts and helps prioritize resource recovery.

2. Locate Backup State Files

If you’ve followed best practices for Terraform state management, you should readily have backup state files. Check these backup files in the following locations:

– Version Control: Review your version control system (e.g., Git) for previous versions of the Terraform state file. Historical versions can be accessed if you commit your state file to version control.

– Remote Backends: Older state file versions may exist in remote repositories, such as Amazon S3 or Google Cloud Storage, if you use a remote backend. Retrieve the most recent backup preceding the issue.

– Local Backups: If you maintain local backups, search for them in designated backup directories or cloud storage services like Dropbox or Google Drive.

3. Use ‘terraform state’ Commands

Terraform provides a set of useful commands to interact with and manage the Terraform state under the subcommand. These commands help recover resources from backup state files. Common terraform state commands include:

– terraform state list: Lists all current state resources, aiding in identifying missing resources.

– terraform state pull: Retrieves and displays the current state in a readable format, allowing you to verify the state’s content and identify discrepancies.

– terraform state push: Replaces the current state with a new state file. Use this command with caution, as it overwrites the existing state. Employ it only when you are certain of the state’s correctness.

– terraform import: Imports existing resources into Terraform’s state. This can be valuable for recovering missing resources.

4. Recover the State of Resources by Referencing a Clone Environment’s Terraform State

In some cases, the above steps may not suffice to recover the Terraform state of your infrastructure. You might need to clone or replicate an environment unaffected by the state issue. Ideally identical or similar to your production environment, this clone environment can serve as a valuable reference for recovering resource states. Subsequently, you can use the Terraform state from the clone environment as a placeholder to recover the state of the target environment.

For example, in the context of the infrastructure mentioned earlier, having two environments—one for production and one for non-production—using the same IAC source code simplifies matters. The Terraform state for each environment shares a similar structure and schema. To recover the Terraform state:

Clone the Terraform state file from the other environment, which will serve as a template for recovery.
Update the resource objects in the state file with relevant labels and identifiers, such as Resource ARN, Resource name, policy_id, and unique_id of AWS IAM resources. Using the aws-cli command to retrieve the information on AWS resources that are not shown in the AWS Console, such as resource unique id
Review other resource details, such as VPC, Nat Gateway, or Internet Gateway, and replace relevant values with those of the recovered infrastructure.
Utilize the terraform refresh command to verify the operational status of the Terraform state after recovery and the terraform plan command to confirm the recovery’s completeness. The terraform plan command should show a status similar to the initial state before the Terraform state loss, indicating changes or updates to resources rather than a complete recreation of resources.

Note: This recovery process may not recover the data for scenarios involving AWS Keypair private keys. In such cases, consider creating a new AWS Keypair if your IAC provisions require it.

The Art of Dockerfile Definition: Unveiling Good Practices for Ultimate Containerization Success

August 3, 2023 by hnminh, posted in Uncategorized

Introduction

Common Problems with Dockerfiles

Dockerfiles, which are used to create Docker images, provide a powerful and efficient way to package applications and their dependencies. However, they can also introduce several common problems:

Image Size: One of the primary concerns with Docker images is their size. Docker images can become too large if not optimized properly, leading to slower builds, deployments, and increased storage requirements.
Layering and Caching: Docker uses a layering system to build images incrementally. However, this can cause issues with caching when a specific layer changes and subsequent builds may not take advantage of cached layers effectively. Or breaking down the Dockerfile into too many layers can cause performance issues due to the overhead associated with each layer.
Security Vulnerabilities: Docker images may include vulnerable packages or configurations, potentially exposing the system to security risks. Care must be taken to ensure that images are built from trusted sources and that unnecessary packages are removed.
Non-reproducible Builds: If Dockerfiles are not properly version-controlled and documented, it can be challenging to reproduce the exact same image for different environments or deployments.
Overuse of Latest Tags: Relying on “latest” tags for base images can lead to inconsistencies and instability as the base image may change over time.

Importance of Good Practices

Good practices in Dockerfiles are of paramount importance in the realm of containerization and application deployment. Dockerfiles serve as blueprints for creating Docker images, and adhering to best practices ensures these images’ efficient, secure, and maintainable construction.

Image Size Problem

Choosing a Suitable Base Image

A base image is the starting point for building a Docker container. It is the foundation on which your application or service will be built. Evaluating base image options involves considering various factors to determine which base image best fits your specific use case. This decision can have significant implications on the efficiency, security, and performance of your final Docker image.

Key considerations when evaluating base image options in Dockerfile practices include:

Official vs. Third-Party Images: Decide whether to use official Docker Hub images provided by the software vendors or third-party images created by the community.
Image Size: Choose a base image that is as small as possible to reduce the overall size of your final Docker image.
Security and Vulnerabilities: Consider the security track record of the base image and whether it receives regular security updates.
Customization Flexibility: Assess how easy it is to customize the base image to fit your application’s specific needs.

In the usual way, all official images will have at least three tags below:

1. Alpine Images:

Size: Alpine images are the smallest and most lightweight among the three. They have a significantly smaller footprint, making them ideal for resource-constrained environments and quicker container startups.
Package Selection: Alpine uses its own package manager, “apk” (Alpine Package Keeper), and a minimalistic approach to package selection. It includes only essential packages, which contributes to its smaller size.
Dependencies: Alpine images use the musl libc and BusyBox, which are smaller alternatives to glibc and provide a more minimalist environment.

2. Debian Images:

Size: Debian images are larger compared to Alpine due to their more comprehensive package repository and glibc usage.
Package Selection: Debian has a vast package repository with a wide selection of packages, providing more versatility and options for various applications and use cases.
Dependencies: Debian images use the glibc library and provide a more feature-rich environment with a broader range of included packages.

3. Slim Images:

Size: Slim images are a variant of the base distribution (e.g., Debian-slim) optimized for a smaller footprint by removing unnecessary packages and documentation.
Package Selection: Slim images include a reduced set of packages compared to the full distribution, aiming to strike a balance between size and functionality.
Dependencies: Slim images offer a middle-ground between the minimalism of Alpine and the broader package selection of the regular distribution.

In summary:

Alpine images are the smallest and most lightweight, focusing on minimalism and efficient resource utilization.
Debian images offer a comprehensive package repository, making them more versatile for various applications but with a larger size.
Slim images provide a reduced size compared to the full distribution, serving as a compromise between full functionality and minimal footprint.

Minimizing Image Size

Minimizing the image size in a Docker image is essential for efficient containerization and faster deployments. Several best practices can be employed to achieve a smaller image size:

1. Cleaning up After Each Step:

Docker images are built in layers, and each layer can introduce additional files and artifacts. To minimize image size, it’s crucial to clean up unnecessary files and temporary artifacts after each step in the Dockerfile. Utilize the RUN command judiciously, and if any step generates temporary files, ensure they are removed in the same RUN instruction. This prevents unnecessary files from being included in the final image, resulting in a leaner and more efficient container.

// DON'T
FROM debian:buster-slim
USER root

RUN set -x && apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y \
    ca-certificates curl

// DO
FROM debian:buster-slim
USER root

RUN set -x && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends\
    ca-certificates curl && \
    apt clean autoclean &&\ 
    apt autoremove --yes && \
    rm -rf /var/lib/apt/lists/*

2. Removing Temporary Files and Artifacts:

During the build process, certain intermediate files may be necessary for compiling or building the application. However, these files are not required in the final image and only contribute to increased image size. Identify and delete these temporary files before proceeding to the next step. Adding appropriate rm or cleanup commands after using temporary files ensures they do not persist in the final Docker image.

// DON'T
FROM debian:buster-slim
USER root

RUN set -x && apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y \
    ca-certificates curl

// DO
FROM debian:buster-slim
USER root

RUN set -x && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends\
    ca-certificates curl && \
    apt clean autoclean &&\ 
    apt autoremove --yes && \
    rm -rf /var/lib/apt/lists/*

3. Minimize the Number of Layers:

Each instruction in the Dockerfile creates a new layer in the image. Minimizing the number of layers reduces the overall image size. Consider combining multiple commands into a single RUN instruction to reduce layer creation. However, be cautious not to combine unrelated commands, as it may negatively impact readability and maintainability.

// DON'T
FROM golang:1.18-buster as builder

RUN mkdir /root/.ssh/

RUN echo "$GO_SSH_PRIVATE_KEY" > /root/.ssh/id_rsa && \
    chmod 600 /root/.ssh/id_rsa && \
    echo "    IdentityFile ~/.ssh/id_rsa" >> /etc/ssh/ssh_config

RUN touch /root/.ssh/known_hosts && \
    echo "Host bitbucket.org\n\tStrictHostKeyChecking no\n" >> ~/.ssh/config && \
    ssh-keyscan -H bitbucket.org >> ~/.ssh/known_hosts

RUN echo '[url "ssh://git@bitbucket.org/"]' >> ~/.gitconfig && \
    echo '        insteadOf = https://bitbucket.org/' >> ~/.gitconfig
    
RUN apt update 
RUN apt install -y curl wget

// DO
FROM golang:1.18-buster as builder

RUN mkdir /root/.ssh/

RUN echo "$GO_SSH_PRIVATE_KEY" > /root/.ssh/id_rsa && \
    chmod 600 /root/.ssh/id_rsa && \
    echo "    IdentityFile ~/.ssh/id_rsa" >> /etc/ssh/ssh_config && \
    touch /root/.ssh/known_hosts && \
    echo "Host bitbucket.org\n\tStrictHostKeyChecking no\n" >> ~/.ssh/config && \
    ssh-keyscan -H bitbucket.org >> ~/.ssh/known_hosts && \
    echo '[url "ssh://git@bitbucket.org/"]' >> ~/.gitconfig && \
    echo '        insteadOf = https://bitbucket.org/' >> ~/.gitconfig
RUN apt update && apt install -y curl wget

3. Leverage Build Cache:

Docker provides a caching mechanism during image builds. Utilize this cache by ordering the instructions in the Dockerfile carefully. Place commands that change frequently, such as copying application source code, towards the end of the Dockerfile. This allows Docker to reuse cached layers for unchanged parts, avoiding unnecessary reinstallation of dependencies or rebuilding.

4. Multi-stage Builds:

Multi-stage builds are an effective way to create smaller Docker images. They involve breaking the build process into multiple stages, with each stage having a specific purpose. In the initial stage, dependencies are installed, and the application is built. Then, in the final stage, only the compiled application and necessary files are copied over, discarding any unnecessary intermediate artifacts. This approach ensures that only the essential components are included in the final image, resulting in a significantly smaller image size.

By incorporating these practices into your Docker image-building process, you can achieve more efficient and smaller container images. This not only reduces resource consumption but also improves container startup times and enhances overall system performance. Additionally, smaller images are easier to manage, distribute, and deploy across different environments, making them a crucial aspect of a well-optimized containerization strategy.

Layering and Caching Problem

A Docker image is composed of multiple layers stacked on top of each other. Each layer represents a specific modification to the file system (inside the container), such as adding a new file or modifying an existing one. Once a layer is created, it becomes immutable, meaning it can’t be changed. The layers of a Docker image are stored in the Docker engine’s cache, which ensures the efficient creation of Docker images.

As a general rule, any Dockerfile instruction that modifies the file system creates a new layer. The other instructions that started with LABEL, ENTRYPOINT, and CMD directives didn’t modify the file system (they just added metadata or configuration to the image), so they didn’t add any layers that increased the file size of the Docker image.

When we attempted to build a Docker image for the second time without making any changes to the Dockerfile, Docker intelligently realized that it already had copies of the image layers it was trying to build. Therefore, Docker didn’t rebuild any image layers it had previously built. Instead, it utilized the docker image layers stored in the cache, accelerating the build process.

When building a Docker image for an application, optimizing the process to minimize unnecessary steps and reduce image size is essential. To achieve a faster Docker build process, several strategies can be employed:

Reducing the build time in the Docker build process is essential for faster development cycles and more efficient container deployment. Several key strategies can be employed to achieve this goal:

1. Leverage Build Cache:

Docker uses a caching mechanism during the build process. Take advantage of this cache by structuring your Dockerfile carefully. Place frequently changing instructions towards the end of the file, and leverage intermediate images that have been cached to avoid redundant steps.

2. Minimize Dependencies and Files:

Keep your dependencies and files as minimal as possible. Avoid installing unnecessary packages or files that are not required for the application’s runtime. Smaller images have shorter build times and faster container startup.

// DON'T
FROM node:18-alpine
WORKDIR /app
// Copy all file in current directory context to Docker build context
// This will include yarn dependencies to build context and require the
// Docker to create a new layer in each time we run Docker build
COPY . .
RUN yarn install --production
CMD ["node", "src/index.js"]

// DO
 FROM node:18-alpine
 WORKDIR /app
 // To fix this, we need to restructure our Dockerfile to help support 
 // the caching of the dependencies. For Node-based applications, 
 // those dependencies are defined in the package.json file. 
 // So, what if we copied only that file in first, install the dependencies, 
 // and then copy in everything else? Then, 
 // we only recreate the yarn dependencies 
 // if there was a change to the package.json
 COPY package.json yarn.lock ./
 RUN yarn install --production
 COPY . .
 CMD ["node", "src/index.js"]

3. Parallelize Build Steps:

If possible, parallelize independent build steps in the Dockerfile using & or other similar mechanisms. This can speed up the build process, especially when building on systems with multiple CPU cores.

4. Caching External Dependencies:

If your application relies on external dependencies such as libraries or packages, consider caching these dependencies locally or in a private package repository. This reduces the need to download them repeatedly during the build process.

5. Avoid Redundant Commands:

Review your Dockerfile and eliminate redundant commands that don’t contribute to the final image. For example, if a previous step already copies a directory, avoid copying the same directory again in subsequent steps.

By applying these strategies, you can significantly reduce the build time in your Docker build process, leading to faster development cycles and more efficient container image creation. Faster builds improve developer productivity and enable quicker iterations during the development and testing phases.

Security Vulnerabilities Problem

Avoiding security vulnerabilities in Dockerfiles is crucial for ensuring the safety and integrity of your containerized applications. One significant area of concern is handling sensitive information, such as passwords, API keys, or cryptographic keys, also known as secrets. Exposing secrets in a Dockerfile or the resulting image can lead to serious security breaches. To mitigate this risk, several best practices should be followed when dealing with secrets:

1. Avoid Hardcoded Secrets:

Hardcoding secrets directly into the Dockerfile is a significant security risk. Instead, refrain from placing sensitive information directly in the Dockerfile. This includes avoiding storing secrets in environment variables, arguments, or in any form visible within the Dockerfile or the final image.

2. Avoid Including Configuration Data In Container:

Avoiding the inclusion of configuration data in containers is a fundamental principle for ensuring security and adhering to the Twelve-Factor App philosophy. The Twelve-Factor App methodology provides best practices for building modern, scalable, and maintainable applications in a cloud-native environment. One of the key factors emphasized in the Twelve-Factor App is a configuration which advocates for separating configuration from code. Here’s how avoiding configuration data in containers aligns with security and the Twelve-Factor App philosophy. We should use environment variables, configuration file mounting or dynamic configuration management such as Kubernetes ConfigMaps, HashiCorp Vault, or Docker Secrets

3. Utilize Environment Variables:

A more secure approach for handling secrets is to use environment variables. During container runtime, secrets can be passed into the container from the host system or the container orchestration platform. This way, secrets remain external to the Dockerfile and the container image, reducing the risk of exposure.

Example (Dockerfile):

# Set environment variables for secrets
ENV API_KEY=your-api-key
ENV DATABASE_PASSWORD=your-db-password

4. Setting User Permissions:

Another crucial security practice is to avoid running containers as the root user. Running containers with root privileges can lead to elevated risks, as potential attackers may exploit vulnerabilities to gain unauthorized access to the host system. Instead, create and use non-root users within the container to execute processes. This helps limit the impact of potential security breaches and restricts unauthorized access to sensitive resources.

Example (Dockerfile):

# Create a non-root user and set appropriate permissions
RUN groupadd -r myapp && useradd -r -g myapp myuser
USER myuser

By adhering to these security best practices, you can significantly enhance the security posture of your Dockerized applications. Keeping secrets external to the Dockerfile, utilizing environment variables, and running containers with non-root users all contribute to reducing the risk of security vulnerabilities. Alongside these practices, it is also essential to regularly update base images, apply security patches promptly, and follow other security best practices to maintain a robust and secure container environment.

Non-reproducible Builds and Container Running Problem

Non-reproducible builds and container running issues can cause inconsistencies and inefficiencies in the development and deployment process. To address these challenges, it is essential to follow best practices related to the order of instructions and maintaining readable and maintainable Dockerfiles:

1. Optimizing Build Caching:

Docker utilizes caching during the build process to speed up subsequent builds. To optimize build caching, it is crucial to place frequently changing instructions towards the end of the Dockerfile. This ensures that the cache remains valid for unchanged layers, reducing build times for subsequent runs.

2. Grouping Similar Instructions:

Group similar instructions together in a single RUN command. Combining related commands reduces the number of layers created in the Docker image, making it more efficient and easier to manage.

Example:

RUN apt-get update && apt-get install -y package1 package2 package3 \
    && apt-get clean

3. Reordering for Efficiency:

Arrange instructions in an order that optimizes build times and resource utilization. For instance, place instructions that are least likely to change towards the top, while keeping frequently changing instructions towards the bottom.

4. Keeping Dockerfiles Readable and Maintainable:

a. Proper Formatting and Indentation:

Maintain consistent formatting and indentation to enhance readability. Properly aligned instructions make the Dockerfile more accessible to developers and facilitate quick comprehension.

b. Adding Descriptive Comments:

Include comments in the Dockerfile to explain the purpose of different instructions. Comments provide insights into the reasoning behind certain decisions, making it easier for others to understand and modify the Dockerfile.

c. Organizing Instructions:

Structure the Dockerfile logically by organizing instructions based on their purpose. Group base image configuration, dependency installation, environment setup, and application-specific commands separately for better organization.

Example:

# Set the base image
FROM ubuntu:latest

# Install necessary packages
RUN apt-get update \
    && apt-get install -y package1 package2 package3 \
    && apt-get clean

# Set environment variables
ENV ENV_VARIABLE=value

# Copy application files
COPY . /app

# Set the working directory
WORKDIR /app

# Define the entry point
ENTRYPOINT ["python", "app.py"]

5. Avoiding the inclusion of magic files and data directly in the Dockerfile:

Avoiding the inclusion of magic files and data directly in the Dockerfile using the COPY command is a best practice that promotes clean and maintainable Dockerfiles. The term “magic files” refers to files or data that are copied into the container image without explicit knowledge of their contents or sources. This practice can lead to several issues and is discouraged for the following reasons:

Obscured Dependencies:
Including magic files in the Dockerfile hides the explicit dependencies of the application. This can make it challenging to understand which files are necessary for the application to function correctly.
Reproducibility Concerns:
Magic files may change over time or may be updated from external sources, leading to non-reproducible builds. This can result in inconsistencies and unexpected behavior when deploying the same image in different environments.

To avoid including magic files and data in the Dockerfile, it is recommended to use explicit and targeted COPY commands to copy only the necessary files into the container image. Each COPY command should have a clear source and destination, making it evident which files are being added to the image.

Example of explicit COPY commands:

# Copy only the necessary application code
COPY app /app

# Copy specific configuration files
COPY config/app.conf /etc/app.conf
COPY config/db.properties /etc/db.properties

Additionally, consider using .dockerignore to exclude unnecessary files and directories from being copied into the container image. This helps further reduce the image size and ensures that only essential files are included in the final image.

6. Leverage ENV in Dockerfile to have clear instructions to run a container:

Leveraging ENV in the Dockerfile is a best practice that provides clear instructions for running containers. The ENV instruction is used to set environment variables inside the container, allowing for easy configuration and flexibility during runtime. Here are the benefits of using ENV in Dockerfiles:

Clear and Configurable Environment:

Setting environment variables with ENV makes it explicit which variables are used by the containerized application. Developers and operators can easily see and modify these variables without having to inspect the Dockerfile or the container’s entry point script.

Easier Parameterization:

Environment variables allow container configurations to be parameterized and decoupled from the Dockerfile. This enables the same Docker image to be used across various environments, such as development, testing, and production, by simply changing the environment variables.

Security and Secret Management:

When using environment variables, sensitive information like passwords and API keys can be passed into the container at runtime instead of hardcoding them directly into the Dockerfile. This improves security by keeping sensitive data out of version-controlled files.

Maintainable Dockerfiles:

By defining environment variables with ENV, Dockerfiles become more maintainable and readable. It’s easier to understand which configuration values are expected and to make changes without affecting the core application logic.

Container Orchestration Compatibility:

Container orchestration platforms like Kubernetes, Docker Compose, or OpenShift can easily manage and update containers’ environment variables, making it seamless to scale and manage containerized applications.

Example (Dockerfile):

# Set the base image
FROM ubuntu:latest

# Set environment variables
ENV APP_PORT=8080
ENV DB_HOST=db.example.com
ENV DB_USERNAME=myuser
ENV DB_PASSWORD=mypassword

# Copy application files
COPY . /app

# Set the working directory
WORKDIR /app

# Define the entry point
ENTRYPOINT ["python", "app.py"]

With the use of ENV, Dockerfiles have become more versatile and maintainable. It allows for easy configuration changes, enhances security, and improves the overall experience of running containers. By adopting this practice, developers can create more flexible and scalable containerized applications that are well-suited for various deployment scenarios.

By adhering to these practices, developers can improve the consistency and reproducibility of builds, reduce container running issues, and create more readable and maintainable Dockerfiles. This ensures smoother development workflows, facilitates team collaboration, and leads to more reliable and efficient containerized applications.

How our quality engineer apply continuous testing in a cloud environment

June 4, 2023 by hnminh, posted in Automation, DevOps, microservice

Key Takeaways

Infrastructure as Code (IaC): Infrastructure as Code (IaC) is a beneficial approach for provisioning and managing the test environment in the cloud. By utilizing tools like Terraform or CloudFormation, organizations can adopt IaC principles to create and configure the necessary infrastructure resources. An effective practice is to separate the infrastructure code and application provisioning into separate folders. This separation allows for better organization and clarity, making managing and maintaining the infrastructure and application code easier.
Test Automation: Emphasizing the use of test automation frameworks and tools is crucial for executing tests with speed and efficiency. One key aspect is integrating automated tests seamlessly into the development CI/CD pipeline. By doing so, automated tests become an integral part of the continuous integration and deployment processes, offering immediate feedback on the software’s quality. This integration ensures that tests are executed consistently and automatically at each pipeline stage, providing rapid insights into the software’s functionality, performance, and reliability.
Continuous Integration and Continuous Deployment (CI/CD): Leveraging GitOps and ArgoCD for efficient and reliable software deployment is essential. Implementing CI/CD pipelines automates the deployment of software changes, ensuring a streamlined release process. By incorporating GitOps principles, the desired state of the infrastructure is defined and version-controlled using a Git repository. ArgoCD, as a GitOps tool, continuously monitors the repositories for changes and automatically deploys the application to the target environment. Additionally, integrating relevant test suites into the CI/CD pipeline at each stage facilitates early issue identification, ensuring the software is thoroughly tested before deployment. By combining GitOps, ArgoCD, and comprehensive testing, organizations can achieve a robust and reliable deployment process in a cloud-native environment.
Service Virtualization: Service virtualization techniques can be employed to simulate dependent services or components not readily accessible in the test environment. Establishing an API interface contract using gRPC protobuf is crucial to support service virtualization effectively. The API interface contract defines simulated services’ expected requests, responses, and behaviors. By utilizing gRPC protobuf, service virtualization enables testing in isolation, ensuring that external dependencies do not hinder the testing process. This approach facilitates accurate and controlled testing scenarios, even when certain services or components are unavailable, enabling thorough testing of the system’s functionality and interactions.
Monitoring and Logging: Implementing comprehensive monitoring and logging solutions, including synthetic monitoring and tracing with OpenTelemetry, is essential for capturing relevant metrics, tracking system behavior, and detecting anomalies. Synthetic monitoring allows for the creation of simulated transactions and interactions to monitor the performance and availability of the system proactively. Tracing, facilitated by OpenTelemetry, provides end-to-end visibility into requests as they traverse the various components of the system, aiding in identifying performance bottlenecks and troubleshooting issues. Organizations swiftly identify and resolve testing issues through monitoring and logging techniques, gaining insights into system health, performance, and stability. They proactively address challenges by employing synthetic monitoring, tracing, and comprehensive logging, ensuring a robust software environment.
Collaboration and Communication: Foster close collaboration and communication between quality engineers, developers, and other stakeholders. Clear communication channels, regular meetings, and shared documentation help ensure everyone is aligned and working towards the same goals.

A methodology for our quality engineering team to perform continuous testing in a cloud environment

Infrastructure as Code (IaC) for consistency and reproducibility in multiple environments

Managing infrastructure in a multi-environment cloud setup presents challenges, particularly in maintaining consistency and reproducibility. Inconsistent configurations across development, staging, and production environments can hinder deployments and cause operational inefficiencies. Coordinating updates among multiple teams become complex and time-consuming. Maintaining environment-specific settings, like network configurations and access controls, adds complexity and increases the risk of misconfigurations. Establishing strong infrastructure-as-code practices, leveraging version control systems, and automating processes is vital to ensure reliable and scalable infrastructure management across diverse cloud environments.

By utilizing Terraform and source control, my team can effectively address the challenges of managing infrastructure in a multi-environment cloud setup. Using Terraform and source control is a robust solution for managing infrastructure in a multi-environment cloud setup. Terraform’s Infrastructure as Code approach ensures declarative configurations, enabling consistent and reproducible deployments. Source control systems like Git offer versioning, collaboration, and change management, facilitating proper tracking and documentation of infrastructure changes. Terraform modules enable the creation of reusable components, reducing duplication and promoting consistency. This integrated approach establishes a streamlined workflow, simplifying infrastructure management, deployments, and change coordination across diverse cloud environments.

To enhance the organization and modularity of the Terraform source code in our solution, we can adopt a folder structure that separates environments. Each environment folder can contain submodules representing specific cloud resources, such as EKS (Elastic Kubernetes Service), VPC (Virtual Private Cloud), IAM (Identity and Access Management), and more. This approach allows us to encapsulate the configuration and dependencies of each resource within its respective submodule, promoting reusability and maintainability. With this skeleton for our Terraform source code, we can easily manage and scale our infrastructure across different environments while maintaining a clear and structured codebase.

To effectively manage the Terraform state in our solution, we leverage Terraform Cloud. By integrating Terraform Cloud into our workflow, we can centralize the storage and management of our state files. Terraform Cloud provides a secure and scalable solution for storing and sharing state, ensuring consistent collaboration across team members. With Terraform Cloud’s version control integration, we can easily track changes to our infrastructure over time and revert to previous states if necessary.

No alt text provided for this image — Terraform Cloud

Driving Continuous Testing Excellence with Automated Solutions in the Cloud

Developing and testing applications in a cloud environment presents a myriad of challenges. Testers must navigate the rapid release cycles of the developer team, grappling with the constant introduction of new features and enhancements. Furthermore, the intricate and expansive nature of cloud infrastructure exacerbates these challenges. Multiple components, services, and configurations pose difficulties in ensuring consistent and reliable testing throughout the entire environment. Consequently, testing delays, compromised quality, and bottlenecks in the development process can arise. Traditional testing approaches may prove insufficient in meeting the demands imposed by frequent updates and deployments.

To tackle these challenges head-on, we adopt a proactive approach by implementing agile testing practices, embracing infrastructure-as-code principles, and harnessing scalable testing tools and automation. These strategic measures fortify our testing capabilities and effectively address the challenges at hand. Doing so empowers our testers to validate new functionalities and changes while upholding optimal quality efficiently. Our unwavering commitment lies in delivering high-quality solutions that cater to the evolving demands of our cloud-based applications.

Inspired by Kent C. Dodds’ esteemed Testing Trophy model, our testing team places significant emphasis on comprehensive testing across various levels. Notably, we prioritize the implementation of unit tests, integration tests, end-to-end tests, and static code analysis. This comprehensive approach ensures extensive coverage and early detection of issues, bolstering software quality. By embracing this model, our aim is to establish a robust foundation for our testing efforts, enhancing software quality and delivering dependable and resilient solutions to our stakeholders. This methodology enables us to forge a well-rounded testing strategy aligned with industry best practices, resulting in optimal test coverage and improved effectiveness in identifying and addressing potential software vulnerabilities.

https://twitter.com/kentcdodds/status/960723172591992832 — The Testing Trophy – Kent C.Dodds

Integrating automation tests from the Testing Trophy model into the CI/CD pipeline is critical to delivering high-quality software. Organizations can reap the benefits of accelerated feedback, early bug detection, and enhanced overall software quality by automating various levels of testing, such as unit tests, integration tests, and end-to-end tests. Our team employs the Bitbucket pipeline to seamlessly integrate these automated tests into our CI/CD workflows, enabling continuous testing and validation throughout the entire software development lifecycle. To centralize and visualize the test results, we rely on ReportPortal.io, a comprehensive platform that furnishes us with valuable insights and detailed metrics. This enables us to assess our test automation efforts’ return on investment (ROI) and make well-informed decisions to optimize our testing practices further.

To effectively automate our testing processes, we utilize a combination of in-house and third-party tools. For API testing, we have developed a bespoke in-house framework tailored to our unique requirements. This framework offers us the flexibility, customization, and seamless integration necessary to test the functionality and reliability of our APIs efficiently. Additionally, we leverage Playwright, a powerful open-source automation tool, for web-based testing. Playwright provides cross-browser compatibility, allowing us to automate web interactions, validate UI elements, and easily conduct end-to-end tests. We ensure comprehensive and robust test coverage across all our applications by employing our in-house API framework and Playwright for web automation.

Streamlining Continuous Testing in the Cloud with GitOps

The integration of GitOps with the Helm chart has revolutionized our approach to managing service versioning across our entire cloud platform. By adopting GitOps as our guiding principle, we have achieved a unified and version-controlled method of deploying and updating services. Git acts as the single source of truth, enabling us to track and manage changes to service configurations and infrastructure effectively. Leveraging the Helm chart, a Kubernetes package manager, we can define and deploy services consistently across different environments. This powerful tool allows us to package and version services, encompassing dependencies, configurations, and deployment parameters. Consequently, each service is deployed with the correct version and configuration, promoting consistency and mitigating configuration drift. The combination of GitOps and the Helm chart empowers us with greater control, traceability, and reproducibility, resulting in more reliable and stable deployments within the cloud environment.

To further enhance our testing practices, we have implemented the ArgoCD App Of Apps Pattern and Bitbucket Pipeline. The ArgoCD App Of Apps Pattern enables us to manage multiple applications and their configurations as a cohesive unit. This pattern simplifies the management of complex testing environments throughout the software development lifecycle. By defining an “app of apps,” we can seamlessly deploy and update multiple services and configurations simultaneously, ensuring consistency while reducing the effort required for managing individual components. Bitbucket Pipeline seamlessly integrates with our Git repositories, providing us with a robust CI/CD platform specifically designed for automated testing. We have defined comprehensive test pipelines within Bitbucket Pipeline, allowing us to execute a variety of tests, including unit tests, integration tests, and end-to-end tests. By harnessing the combined power of the ArgoCD App Of Apps Pattern, Bitbucket Pipeline, and GitOps with the Helm chart, we have established a comprehensive and efficient testing framework. This framework ensures the reliability and stability of our platform, while promoting collaboration among team members and accelerating our development processes.

Harnessing Service Virtualization to Empower Teams and Overcome Inaccessible Dependent Services

Service Virtualization, combined with the gRPC protobuf as a contract interface, offers an effective solution for building a robust Service Virtualization infrastructure. To further enhance the capabilities of Service Virtualization, tester teams can leverage specialized tools like Camouflage or Hoverfly to simulate edge cases of third-party services during integration testing.

Camouflage and Hoverfly are powerful tools that enable testers to create virtual representations of third-party services and simulate various scenarios, including edge cases, failures, and performance bottlenecks. By configuring these tools to mimic the behavior and responses of the actual services, tester teams can thoroughly test their systems’ resilience and performance under different conditions.

Leverage gRPC protobuf as the contract interface adds an additional layer of efficiency and compatibility to build Service Virtualization. The protobuf specification provides a clear definition of the service structure and behavior, facilitating seamless integration with Service Virtualization tools. This enables testers to accurately emulate the communication patterns and responses of third-party services, ensuring comprehensive testing and validation of their own systems.

Building external virtualized services by leveraging gRPC protobuf further extends the benefits of Service Virtualization. These virtualized services enable the team to conduct performance testing with multiple scenarios, simulating the behavior of third-party services. By defining various scenarios, such as high loads or specific error responses, the team can evaluate their system’s performance and identify potential bottlenecks or scalability issues.

With Service Virtualization, the team can run performance tests against these external virtualized services and assess their system’s behavior under different conditions. This approach provides valuable insights into system performance, helping the team optimize their software and ensure its reliability in the cloud environment.

With the external virtualized services by leveraging gRPC protobuf or the tools like Camouflage or Hoverfly, tester teams can effectively simulate edge cases and challenging scenarios when integrating with third-party services. This comprehensive testing approach ensures the robustness, scalability, and reliability of their systems, ultimately delivering high-quality software in the cloud environment.

Using Monitoring and Logging for Shift-Right Testing in the Cloud

Utilizing monitoring, logging, and synthetic monitoring as part of the Shift-Right Testing strategy in the Cloud, while adhering to the DevOps Hourglass model, provides organizations with the means to guarantee the reliability and performance of their cloud-based applications.

A Practical Guide to Testing in DevOps - Katrina Clokie — A Practical Guide to Testing in DevOps – Katrina Clokie

The DevOps Hourglass model places significant emphasis on establishing continuous feedback loops between development, operations, and quality engineer teams. Within the context of Shift-Right Testing, monitoring and logging serve as pivotal components for gathering valuable insights from live production environments. By meticulously monitoring essential metrics, collecting log data, and scrutinizing events, organizations gain real-time visibility into the performance, availability, and behavior of their systems. This proactive approach enables them to swiftly identify and address issues, elevate system performance, and enhance the overall user experience.

Complementing traditional monitoring techniques, synthetic monitoring replicates user interactions and transactions within the application. By executing synthetic transactions from diverse locations, organizations can vigilantly monitor the system’s performance, responsiveness, and availability. This methodology facilitates the identification of potential performance bottlenecks, anomalies, and issues that could adversely impact end-users. By integrating synthetic monitoring into the Shift-Right Testing strategy, organizations gain comprehensive insights into the application’s performance under various conditions, allowing them to preemptively identify and resolve potential issues before they manifest for users.

The amalgamation of monitoring, logging, and synthetic monitoring empowers organizations to adopt a proactive approach in Shift-Right Testing within the Cloud. They can continuously monitor system performance, availability, and user experience, ensuring that applications meet the expected standards and deliver an uninterrupted experience. By leveraging these techniques, organizations can promptly detect issues, optimize system performance, and iterate on their applications based on real-time insights. This comprehensive approach aligns harmoniously with the principles of DevOps, fostering collaboration, feedback, and continuous improvement throughout the software development lifecycle in the dynamic cloud environment.

Fostering Collaboration and Communication: Key Enablers for Continuous Testing Success

Acknowledging the paramount significance of effective collaboration and communication among quality engineers, developers, and stakeholders, our team has placed great emphasis on enhancing these aspects in our continuous testing endeavors. Drawing inspiration from Gojko Adzic’s book, “Fifty Quick Ideas To Improve Your Tests,” we have implemented two key techniques to strengthen our collaborative practices: “Define a shared big-picture view of quality” and “Design tests together with other teams”

Fifty Quick Ideas to Improve Your Tests - Gojko Adzic, David Evans and Tom Roden — Define a shared big-picture view of quality

Our adoption of the first technique centers around establishing a collective understanding of quality objectives and expectations across all teams engaged in the testing process. By aligning ourselves with critical quality attributes and product goals, we ensure that our testing efforts remain focused on delivering the desired outcomes. This shared understanding acts as a guiding principle, facilitating informed decision-making and effective prioritization of testing activities. Moreover, it nurtures an environment conducive to communication and collaboration by providing a common language and framework for quality discussions within our team.

In parallel, we have enthusiastically embraced the technique of designing tests collaboratively with other teams. We actively foster an environment where quality engineers, developers, and relevant stakeholders collaborate to design tests in unison. This collaborative approach enables us to draw on diverse perspectives and expertise, creating comprehensive and robust test suites. By leveraging the collective knowledge and insights of our quality engineers and developers, we gain a deeper understanding of the system under test, enabling us to design more effective and thorough tests. Furthermore, we actively engage other teams, including product managers, UX designers, and business analysts, to ensure that our tests seamlessly align with product requirements and user expectations. This collaborative approach elevates the quality of our tests and promotes cross-functional learning and knowledge sharing within our organization.

Ultimately, our team profoundly recognizes that effective collaboration and communication among quality engineers, developers, and stakeholders are pivotal to achieving success in continuous testing. Through the implementation of the “Define a shared big-picture view of quality” and “Design tests together with other teams” techniques, we have witnessed remarkable improvements in collaboration and the overall quality of our testing efforts. By aligning our goals, leveraging diverse expertise, and actively involving relevant stakeholders, we have attained higher quality, agility, and customer satisfaction levels in our continuous testing endeavors.

CHIA SẺ VỀ TESTING VỚI HỆ THỐNG MICROSERVICES – PHẦN 2

February 20, 2022February 21, 2022 by hnminh, posted in Automation, Manual, microservice

Như phần 1 mình đã đề cập, thì các phần tiếp theo mình sẽ chia sẻ các vấn đề mà bản thân mình gặp phải khi làm testing trong 1 hệ thống sử dụng kiến trúc Microservices, bạn nào chưa coi qua các phần trước có thể theo dõi lại ở danh sách kế bên

Chia sẻ về testing với hệ thống Microservices – Phần 1

Bạn sẽ làm gì khi được tham gia 1 dự án với mô hình kiến trúc Microservices?

Một ngày đẹp trời bạn nhận được một lời đề nghị tham gia dự án mới để xây dựng sản phẩm thật là hoành tráng. Khi đó mọi người đều nói rằng sẽ áp dụng kiến trúc Microservices. Vậy khi đó bạn cần sẽ làm gì để giúp cho team cũng như đồng đội testers sau này sẽ cảm thấy không phải bơi trong một mớ hỗn độn, mọi thứ trở nên rối rắm, và bản thân mình cũng không biết phải giải thích với đồng đội mới hệ thống này hoạt động ra sao.

😩 Thật ra mình chưa được trải qua kinh nghiệm đó, nhưng từ kinh nghiệm mình trải qua với hệ thống hiện tại thì, nếu bạn được tham gia dự án đó từ ban đầu thì đó là cơ hội to lớn để bạn có thể cùng đội ngũ Developer xây dựng nền móng vững chắc cho sản phẩm sau này, ở đây theo quan điểm của mình thì đối với những sản phẩm mới, đang trong quá trình phát triển để ra được một MVP (Minimum Viable Product) thì bạn có thể đặt những câu hỏi sau trong quá trình chuẩn bị phát triển nó:

Microservices trong thời điểm này liệu có thật sự cần thiết? Câu hỏi này nó như cách để cả team cùng ngồi lại nhìn nhận vì sao mình chọn thiết kế hệ thống theo Microservices, những ưu điểm và nhược điểm khi chọn cách này trong giai đoạn MVP, từ đó team sẽ cùng thống nhất với nhau về cách xây dựng và phát triển sản phẩm ngay từ nền móng ban đầu.
Nếu sử dụng Microservices thì boundary context cho những services đó là gì? Có cần phải phát triển dựa theo mô hình DDD (Domain-Driven Design) không?
Việc thực hiện ghi lại các documents liên quan về hệ thống sẽ được thực hiện ra sao? Có thể sẽ có bạn thắc mắc là nếu làm theo mô hình Agile thì trong mô hình đó có một ý đại khái là “Working software over comprehensive documentation”, dịch ra thì kiểu đại khái “việc phát triển sản phẩm và giúp nó hoạt động sẽ quan trọng hơn việc phát triển những tài liệu một cách đầy đủ/toàn diện”, ở đây mọi người lưu ý cụm từ “1 cách đầy đủ/toàn diện” thì nó mang hàm ý các bạn phải biết việc phát triển những tài liệu thế nào là vừa đủ, để nó cân bằng 2 thứ giữa việc phát triển phần mềm và tài liệu “vừa đủ” để các bên liên quan vẫn có thể tham khảo sau này.
Các services bên trong thì có thể thực hiện việc testing cho nó như thế nào? Việc hỏi câu này sẽ giúp các bạn định hướng cho team về khả năng testability của service ngay từ lúc ban đầu, tránh việc sau khi phát triển xong cũng không biết cách nào để test cho được những cái mình muốn test.
Có các dependencies services nào bên ngoài có thể cản trở việc testing hoặc phát triển sản phẩm không? Câu này cũng giúp các bạn định hướng khả năng testability của services

Đấy là những gì mình nghĩ có thể cần thiết để giúp cho các bạn khi được tham dự một dự án thiết kế và xây dựng sản phẩm theo mô hình Microservices ngay từ ban đầu. Những câu hỏi trên có thể giúp mọi người định hướng được việc xây dựng và kiểm tra sản phẩm có thể rõ ràng hơn.

Cách để một người mới tìm hiểu kiến thức về hệ thống Microservices trong dự án trong giai đoạn hậu MVP?

Trở lại với câu chuyện của mình 🥲 , thời điểm mình tham gia dự án thì sản phẩm đã được xây dựng cũng hơn 6 năm. Và theo độ dài vòng đời của một startup thì sản phẩm này vẫn đang ở trong giai đoạn còn non trẻ. Và vì còn non trẻ cho nên việc chạy đua để sớm ra tính năng, sớm ra MVP là điều quá hiển nhiên.

Và nó cũng kéo theo một hệ quả là khi mình tham gia dự án thì những tài liệu về hệ thống khá là rời rạc cũng như có phần bị “lạc hậu” khá nhiều. Bên cạnh đó các tài liệu về testing cũng không thật sự có đầy đủ, việc cân bằng giữa đẩy nhanh tiến độ phát triển sản phẩm và chất lượng sản phẩm vẫn luôn là một bài toán khó trong các mô hình phát triển sản phẩm, dù bạn có làm theo mô hình Waterfall hay Agile, thì nó vẫn luôn rất khó giải quyết.

Vì thế bài toán đầu tiên mình cần giải quyết là làm sao để có thể nắm được kiến thức về hệ thống một cách nhanh nhất có thể 🥲

Confluence Page hay bất kỳ nơi nào chứa các documents hoặc notes là người bạn của bạn

Ở công ty của mình thì mọi tài liệu hiện tại đang được lưu trữ trên Confluent Page, tuy có những tài liệu nó không có hoặc bị outdated nhưng ít nhất mình cũng sẽ có được những thông tin ban đầu về hệ thống, mặc dù những thông tin đó có thể vẫn sẽ còn khá rời rạc và chưa có tính kết nối với nhau.

Khi tìm kiếm trên các trang lưu trữ tài liệu thì kỹ năng xây dựng từ khóa để tìm kiếm là phần rất quan trọng, nó tượng tự kỹ năng khi bạn gặp vấn đề và cần tìm cách tìm kiếm trên Google. Một vài keywords mình thường dùng để tìm kiếm tài liệu về hệ thống như sau: “Tên_service SAD”, “Tên_service sequence diagram”, “Tên_service API document” , “Tên_service Architecture”, “Tên_serivce Error Code” hoặc chỉ cần đơn giản là tìm kiếm với tên service mà bạn đang muốn tìm hiểu.

Kỹ năng đọc Sequence Diagram và cách hiểu hình vẽ High-level Software Architect Design là thứ cần thiết

Khi bạn đa có trong thay thông tin hay tài liệu của service mình cần thì điều kế tiếp là bạn phải biết cách đọc và hiểu nó. Nó như kiểu võ công bí kíp đã đưa tới tận tay cho bạn và việc còn lại chỉ là làm sao để luyện thành võ công của mình 😄

Như ví dụ sau đây thì mình mượn tạm cái hình của TiDB khi mô tả về cách nó hoạt động bên trong nó:

Như hình sequence digram này thì các bạn sẽ thấy nó sẽ có những flows tương tác giữa 3 hệ thống bao gồm tidb, pd và tikv, trong đó thì giữa tidb và pd sẽ có khối lệnh riêng để truy vấn toàn bộ những key theo region và sau đó thì giữa tidb và tikv để truy vấn toàn bộ data theo những region key đã tìm được từ khối lệnh trên. Và cuối cùng là khối lệnh giữa client và tidb

Từ những thông tin được thể hiện trong sequence diagram thì các bạn đa có cái nhìn rõ hơn về cách các services sẽ tương tác với nhau như thế nào

Bên cạnh đó việc đọc và hiểu high-level architect sẽ giúp cho bạn có cái nhìn tổng quan hơn về toàn bộ hệ thống bên dưới, bạn có thể dễ dàng biết được những critical services sẽ là những services nào, những service nào sẽ là consumer và provider của nhau

Như ví dụ sau thì mình cũng mượn tạm hình của TiDB:

Như hình ở trên thì mình có thể nắm được là hệ thống kiến trúc của TiDB sẽ có 4 phần bao gồm TiDB, TiKV PD và TiSpark, và phần TiSPark sẽ là 1 phần riêng biệt không ảnh hưởng TiDB. Phần PD sẽ có nhiệm vụ quản lý metadata, và nó sẽ liên quan tới việc quản lý cluster của TiDB và có thể nó là phần quan trọng trong hệ thống TiDB

Hãy dùng Mindmap để tạo nên bức tranh tổng thể về toàn bộ hệ thống của sản phẩm mình đang làm

Mình nghĩ có nhiều cách để các bạn có thể xâu chuỗi những mảnh nhỏ về thông tin hệ thống thành 1 bức tranh tổng thể hơn, còn với mình thì vẫn duy trì thói quen sử dụng Mindmap cho việc này. Nó giúp mình vừa có thể dễ dàng visualize những thông tin mà mình có được thành bức tranh tổng thể hơn, bên cạnh đó khi visualize nó lên Mindmap thì mình cũng có thể nhìn ra những chỗ thông tin mà mình vẫn còn thiếu để từ đó có thể tìm cách đi tìm kiếm những thông tin đó.

Exploratory testing và những tool liên quan tới MIM là cách hiệu quả để tìm hiểu thêm về sản phẩm ở mặt business knowledge và technical knowledge

Việc kế tiếp bạn cần làm là tiến hành explore product như góc nhìn của một end user và kèm theo những tool để giúp bạn có cái nhìn ở mặt kỹ thuật. Bên mình thì sản phẩm nó sẽ được phát triển trên 2 nền tảng là Web và native mobile application, vì thế mình kết hợp cả việc dùng chrome developer tool và những tools dạng MIM (man in middle) như mitm.proxy hoặc Proxyman để intercept network.

Ở đây mình thường sẽ thực hiện 1 user journey flow và sau đó coi thử với flow đó thì hệ thống nó sẽ có những APIs nào tương tác trong flow đó, và kế tiếp là mình sẽ liên kết với những thông tin mình đã có được từ SAD và sequence diagram để có thể có một bức tranh ở mặt user journey flow.

Việc này sẽ cực kỳ có ích khi các bạn bắt đầu phát triển tính năng mới, hoặc khi sản phẩm của các bạn đã có end user sử dụng, khi đó nếu bạn gắp phải những issue (ở đây mình chỉ gọi là issue vì có thể nó chưa hẳn là bug nhé) hoặc khi khách hàng thông báo về những issue họ gặp phải thông qua hệ thống CS (customer service) thì các bạn có thể dùng khả năng suy luận để dự đoán và giới hạn khả năng issue nó từ đâu. Như hình sau từ bài viết của Oracles from the Inside Out – Michael Bolton

“Oracle is the principle or mechanism used to identify the problem. Oracle helps in making decision about the fault”

Hy vọng phần hai này sẽ có những thông tin hữu ích cho các bạn khi tham gia phát triển một sản phẩm theo mô hình Microservices ở thời điểm ban đầu hoặc là “tay ngang” như mình. Mình xin kết thúc phần hai ở đây, viết nãy giờ cũng khá dài rồi, hẹn các bạn ở phần ba, nơi mình giãi bày tiếp về những khó khăn trong quá trình phát triển tiếp tính năng và việc thực hiện testing cho nó 🥲

Chia sẻ về Testing với hệ thống Microservices – Phần 1

February 6, 2022February 6, 2022 by hnminh, posted in Automation, Manual, microservice, Performance

Năm mới chúc mọi người có thật nhiều sức khỏe, an khang thịnh vượng, và sự nghiệp ngày càng thăng tiến

Đôi dòng tản mạn trước khi bắt đầu chuỗi bài viết, năm 2021 thì mình cũng không có nhiều bài viết lắm do 1 phần mình vừa mới thay đổi chỗ làm, và công việc cũng hơi nhiều. Bên cạnh đó thì cũng mong muốn trải nghiệm nhiều hơn về testing để có thể chia sẻ thêm cho các bạn những kinh nghiệm thực tế nhưng mà mình trải qua. Còn nói đơn giản hơn thì năm ngoái mình hơi lười 🤣 chủ yếu mình chỉ toàn post bài trên fanpage Facebook để chia sẻ vài mẹo nhỏ khi thực hiện testing. Hy vọng năm nay mình sẽ bớt lười lại và có nhiều bài viết hơn ở blog này 😮‍💨 Ví dụ như chuỗi bài viết sắp tới đây về Microservices là những kinh nghiệm và thực tế mình đã trải qua trong 1 năm vừa rồi, mình thấy có những cái hay ho có thể chia sẻ để mọi người cùng có thêm những thông tin thú vị 😄 ,vì là những kinh nghiệm do mình trải qua nên có những thứ nó sẽ được nhìn nhận ở khía cạnh từ quan điểm của mình, nên có thể các bạn có cách nhìn nào hay hơn có thể comment chia sẻ nhé 😉

Microservices là gì?

Trước khi bắt đầu chuỗi series bài viết này thì mình cũng muốn chia sẻ qua 1 tí về Microservices là gì và nó khác gì so với những kiến trúc xây dựng ứng dụng trước đây?

Đầu tiên thì phải nói tới thời xa xưa, thời mình còn là con nòng nọc, lúc các chú các bác bắt đầu có Internet, và phát triển những phần mềm ứng dụng cho người dùng. Khi đó thì việc xây dựng ứng dụng nó rất là đơn giản, nó như kiểu mình xậy dựng cái nhà cấp 4, đổ nền, dựng tường, lợp mái nhà, và thế là có cái nhà. Tương tự với nền tảng ứng dụng thì chúng ta có việc xây dựng persistent layer (lớp lưu trữ dữ liệu), business logic layer (lớp chứa các yêu cầu và cách mà ứng dựng hoạt động), và cuối cùng là presentation layer (nơi hiển thị thông tin cho người dùng), ví dụ như bạn có nhu cầu xây dựng 1 ứng dụng cho phép hiển thị hình ảnh trai xinh gái đẹp, khi đó đầu tiên bạn cần persistent layer để chứa hình ảnh trai xinh gái đẹp, bạn cần có business logic để cho phép thêm, xóa hoặc chỉnh sửa hình ảnh, cũng như hiển thị hình ảnh, bạn cần presentation layer để hiên thị hình ảnh đó cho user xem trên bất kỳ nền tảng nào chẳng hạn. Với việc xây dựng ứng dụng theo cách trên thì ta gọi nó là Monolithic Architecture.

Rồi thì con người cũng phải phát triển và tiến bộ hơn 🤔 từ xe bò ta lên xe ngựa, từ xe ngựa ta lên xe động cơ đốt trong và dần dần thì ta có chiếc xe hơi xịn xò như ngày hôm nay. Tương tự cho ngành công nghệ phần mềm, điều đó cũng xảy ra, từ 1 ứng dựng đơn giản chỉ click, click rồi hiển thị thông tin, ta có thêm nhiều yêu cầu hơn, ta có thêm nhiều tính năng hơn cho ứng dụng đó. Và dần dần từ Monolithic Architecture ta có Service-oriented Architecture và cuối cùng là Microservices Architecture.

Trở lại ví dụ ban đầu ở trên ứng dụng hiển thị trai xinh gái đẹp, ban đầu mình chỉ có nhu cầu hiển thị hình ảnh trai xinh gái đẹp nên ứng dụng khá đơn giản. Sau 1 thời giản ứng dụng được đưa vào hoạt động thì lại có nhu cầu mở rộng thêm như cho phép hẹn hò, cho phép đặt chỗ khách sạn hay mua vé xem phim cũng như quẹt trái quẹt phải, mua gói premium chẳng hạn, rồi bùm chúng ta có ứng dụng Tinder 😛 Khi đó thì việc tiếp tục duy trì với Monolithic Architecture hoàn toàn khả thi, chỉ là nếu tiếp tục có thêm nhiều tính năng hơn nữa thì khi đó ứng dụng sẽ trở nên khá lớn, dẫn tới việc khó bảo trì hay mở rộng thêm tính năng, ngoài ra thì khi cần thay thế hoặc nâng cấp bất cứ thứ gì nhỏ nhất bên trong ứng dụng đều tiềm ẩn những rủi ro liên quan tới mức độ phụ thuộc và độ tương thích với toàn thể ứng dụng, ví dụ như nâng cấp phiên bản thư viện nào đó trong business logic layer. Từ những vấn đề nêu trên thì dần dần phát triển ra SOA và Microservices Architecture, 2 kiến trúc trên ra đời để giải quyết những vấn khó khăn gặp phải Monolithic Architecture khi ứng dụng càng ngày càng phát triển và mở rộng hơn. Vậy SOA và Microservices Architecture khác gì so với Monolithic Architecture 🥺

Cả Service-oriented Architecture và Microservices Architecture, thì ta đều có bước chuyển đổi dần trong việc chia tách những tính năng của ứng dụng thành những module nhỏ hơn. Lúc này ứng dụng sẽ được chia nhỏ ra thành những services và mỗi services chỉ phục vụ 1 business logic cụ thể, như ví dụ ở trên lúc này khi mình muốn thêm tính năng đặt phòng khách sạn hay vé xem phim thì mình sẽ có riêng 1 service cho phần này, khi đó nếu mình có thay đổi hay làm gì đó thì những tính năng trước như thêm xóa sửa hình hoặc mua gói premeium sẽ không bị ảnh hưởng gì cả.

Vậy lợi ích của 2 kiến trúc này so với Monolithic là gì? Rõ ràng khi bạn chia tách những tính năng ra thành những module/service riêng biệt thì việc phát triển hay mở rộng ứng dụng sẽ trở nên dễ dàng hơn. Bên cạnh đó việc thay thế module/service cũng sẽ không quá phực tạp, ví dụ như mình có module/service đặt vé xem phim với CGV, giờ đùng 1 phát CGV ko muốn hợp tác nữa thì việc mình thay thế đặt vé xem phim với Lotte cũng được thực hiện 1 cách dễ dàng và không ảnh hưởng nhiều tới các services khác. Ngoài ra còn vài lợi ích khác so với Monolithic như việc mở rộng ứng dụng sẽ đơn giản hơn do lúc này mình có thể mở rộng ở mức module/service, ví dụ sau 1 năm users của ứng dụng mình tăng lên từ 100 người lên 1tr người và phần lớn toàn vô để đặt khách sạn hoặc vé xem phim, khi đó việc mở rộng module/service liên quan tính năng trên cũng sẽ dễ dàng hơn so với Monolithic, do với kiến trúc Monolithic thì khi mở rộng lên thì mình cần triển khai toàn bộ ứng dụng, dẫn tới có những phần không cần thiết phải mở rộng lên nhưng vấn được “khuyến mãi” kèm theo, và kéo theo cost sẽ không được tối ưu. Bên cạnh đó nó còn có những ưu điểm khác nữa, mọi người có thể coi thêm ở Microservices Architecture.

Vậy SOA hay Microservices Architecture nó quá hoàn hảo và không có nhược điểm gì cả? 😩 Rất tiếc là ông trời không cho cái gì hoàn hảo cả, với những ưu điểm như mình có nói sơ qua ở trên thì nhược điểm của 2 kiến trúc trên cũng có khá nhiều, ví dụ như mức độ phức tạp khi phát triển cũng như testing. Tính thống nhất giữa các services khi thực hiện communicate với nhau (data, integration, etc.) và nếu phát triển 1 hệ thống theo hướng SOA hay Microservice mà không có tính thống nhất cao (contract giữa các service, bounded context rõ ràng) thì chúc mừng team các bạn, các bạn đã quay trúng ô “1 mớ rác hỗn độn” hay còn gọi là microservice chaos, lúc này nó còn tệ hơn Monolithic Architecture 🥲

Nhìn chung 2 architectures này không quá khác biệt về mặt phylosophy, và ở mức độ bài viết của lần này thì mình cũng sẽ không đề cập quá chi tiết về sự khác biệt của 2 kiểu kiến trúc này. Nhìn chung cả 2 kiến trúc trên đều hỗ trợ nhà phát triển trong việc phát triển 1 ứng dụng khi nó bắt đầu trở nên lớn hơn, có nhiều tính năng hơn, cũng như phức tạp hơn.

Theo quan điểm của mình thì việc vận dụng hoặc lựa chọn 1 trong 3 kiến trúc này sẽ phụ thuộc khá nhiều vào nhu cầu của ứng dụng cũng như mô hình kinh doanh, không có cái nào vượt trội hơn hẳn cả, mọi cái đều có ưu điểm và nhược điểm của mình. Riêng về SOA và Microservices Architecture bạn nào quan tâm thì có thể đọc thêm bài này để rõ hơn về sự khác nhau giữa 2 kiến trúc này cũng như những ưu điểm và nhược điểm của nó SOA vs Microservices Architecture, vì nhìn qua thì 2 kiến trúc này khá tương đồng nhau 🤣

Những khó khăn trong việc testing với Microservices Architecture

Giờ thì chuẩn bị vô phần chính của chuỗi series blogs này của mình 🤠 Như mình đã giới thiệu ở trên, từ mô hình Monolithic chuyển sang SOA hay Microservices hay thì nó có những sự khác biệt cơ bản, và dẫn tới khi thực hiện testing cũng có những vấn đề cần giải quyết xung quanh đó, dưới đây là những vấn đề mà trong cả năm qua mình đã trải nghiệm cũng như giải quyết nó, và ngoài ra cũng có những vấn đề mình chưa giải quyết xong dự định năm nay sẽ ráng “xúc” nó luôn 🙄

Làm sao để nhận diện sự ảnh hưởng hoặc mức độ phụ thuộc lẫn nhau giữa các services khi document không đầy đủ hoặc bị outdated
Làm sao để có thể tìm hiểu về cả hệ thống từ view end-user đi xuống tới mức services, mình hay gọi cách này là top down approach
Làm sao để thực hiện việc testing hiệu quả khi có hơn gần 60+ services trong cả hệ thống, giải quyết bài toán mỗi lần service deploy new version và thực hiện regression testing thế nào
Làm sao để thực hiện việc automation checking cho hệ thống microservices, bao gồm cả việc rút gọn thời gian run automation cho toàn bộ test suites chỉ dưới 10-30 phút cũng như chiến lược cho việc thực hiện automation checking
Làm sao để thực hiện performance testing cho service khi nó có quá nhiều dependency services
Làm sao để thực hiện việc shift-right testing đối với 1 hệ thống lớn khi có new release quan trọng lên production
Làm sao để thực hiện việc testing distributed transaction trong hệ thống này

Như cái danh sách ở trên thì đó là 1 trong những vấn đề chính mà mình đã gặp phải trong cả năm vừa qua, có những cái mình đã xử xong, và cũng có những cái gọi là technical debt hy vọng năm nay xử tiếp 🤐 Trong chuỗi bài viết này thì các phần kế tiếp mình sẽ chia sẻ kinh nghiệm, trải nghiệm cũng như cách mình đã sử dụng để giải quyết dần những vấn đề trên. Đầu năm chắc mở hàng tới đây thôi, để dành cho phần tiếp theo thôi 👻

Infrastructure as Test

April 11, 2021April 11, 2021 by vinh.nguyen, posted in Automation, Chit&Chat, DevOps, Katalon Studio, Manual, ReportPortal

Previously I wrote a theory guide regarding more testing approaches for mobile applications overall. It’s easier when said rather than actual implementations. Honestly, testing is not simple like theory.

Testing is complicated, and it’s more challenging if you didn’t plan it for the beginning, more difficult when you only thoughts you can write some automation scripts to save time and then spend x3 efforts to maintain. Yet, it’s more painful when it comes to scaling tests to be agreed upon and supported by the whole team. You heard the ultimate benefits of continuous testing to it’s always good to implement that infrastructure, but you can’t maintain CI flow stability properly.

Testing infrastructure

When I sad this term, I want to point to the support and the ‘things around the testing. Followed with shift-left principle, testing activities presented in all development parts for an iteration process.

How you integrate your team and build an infrastructure to support testing is more critical than “hey, find an automated framework and let’s start writing some test scripts.” The purpose of using the in-house framework or even buy a commercial solution to help reducing testing efforts and increase collaboration is just one point of the whole testing infrastructure.

What affects the testing structure is:

The test strategy
The team
The automated testing framework
The test execution
The CI/CD solutions and pipelines
The application under testing including versioning, environments, and deployments
Other types of testing

Test Strategy

This is the most master thing needed for the outcome of the testing infrastructure. This is not a typical test strategy to highlight what needs to be tested but also concludes the approaches to reach the mature of testing infrastructure. Part of it is the CI/CD pipeline maturity levels.

Collaboration

Collaboration is the fundamental key here. Your testing team and even developers have to agree on what you would like to construct, not just you do it by yourself. This act won’t go anywhere. If the developers know how testing is built and how testing pinpoints the speedy results upon the pull requests, they will undoubtedly support us.

Collaboration does not incline the usage test script collaboration. Do you want the team to know to collaborate on the test script? Then the baseline of collaboration should be encouraged by testing framework extensibility.

Visibility

The testing results have to be seen from your team, including developers, managers, etc. Full visibility about the test results on the infrastructure must be public and presented in either high level or details-oriented levels.

Visibility is attainable through integration between the test management tool and the tests pipeline supplement it. In some cases, can you break that link?

Automated testing framework

The baseline of testing infrastructure is the testing framework.

At first, the team has to think, is it worth creating the in-house framework, or do POC using some open-source frameworks out there or buy a commercial solution to help it?

Or can I use some other supporting solutions such as Cypress, Playwright, Karate, or Katalon? The answer laid to how the team wants to address automated testing perspectively. The testing framework should be the whole team effort, not just you or some individuals.

You must separate your thoughts between the testing framework and the scripts created by it. The testing framework I stated here is very distinct from the automated test scripts. How fast or firm, or maintainable the scripts all depend on the testing framework’s nature. You can produce a testing framework very quickly, but then time spent to develop test scripts and changes into AUT versioning or integrable changes from the sprint will quickly catch the testing team on fire.

Test Execution

Test execution in pair with the tests you have in the test repository. There is an interesting article to map testing efforts into automation.

The test execution from one session confines not just one test but multiple tests at the same time on given testing environments, on given browsers/devices, and on given pipelines. The testing infrastructure compromise different solutions to a specific type of breakpoints

Data

To create the proper test script, it must have 3 phases: Arrange – Act – Assert. Arrange is where you set up your test and, most importantly, is the test data. Martin Fowler has a stumble guide regards to Test Data preparation in his book. The main principle for this with testing infrastructure is to highlight the actual data testing in need of it accordingly.

Attain this depends on the testing pipeline and how the testing framework obtains data from data delivery. The immediate approach is using developer techniques such as test doubles, stubs, static data, and directly seed data from databases. This supplement can also help a complex work when the team wants to streamline the CI processes without no human interception.

How can this be accomplished in the testing infrastructure? To consider which data to be used, think every test data is a model. If it’s a person, it will likely have a name, age. If it’s a product, it will likely have a name, price, category, etc. Consider this fact apply changes in both framework and the CI pipeline:

For the framework, consider applying the test data factory. Either static data from JSON, data tables, or random data can be done and easy to maintain
For the CI pipeline, consider directly seed data from databases if possible. But you need to ensure you don’t violate data storage privacy from end customer perspectives.

The Reporting

Report for a single execution is easy when the tests are executed locally. But our testing infrastructure doesn’t do small tests like that. The infrastructure is responsible for determining which tests should be done, when and where the tests are executed against what environments and collect overall reports. As I’ve said previously, using the CI/CD tool, you can actually view the final result of an executed session, but not really enough.

The reports need to be viewed from different perspectives, not just the creators. The managers want to see the reports from his perspectives, so reporting needs to be presented in many different kinds of levels to adapt the viewing persona.

I usually use ReportPortal from the beginning, but many other solutions already integrate the reports.

Flaky Tests

The flaky test is the most common type of failure you will encounter when the automated scripts are executed. Google has some must-read articles for this that you can refer to:

One of the common ways to heal flaky tests is to retry the test as part of the testing framework feature. For me, it’s incorrect.

Retry should be handled directly from the CI/CD tool instead. When a testing pipeline is failed due to the environment is not reachable or the network connection is being very slow, rather than let the framework do the retry, let the CI/CD tool do that instead. It’s very controllable directly from the pipeline.

To supplement analyzing flaky tests is information. The notion of information is about all things you can get:

Logs
Test failure snapshot
Capture DOM at the point of failure
History execution results

Logs, snapshots, and captured DOM are done from the testing framework. For historical execution results, you can utilize CI/CD tool or use ReportPortal. ReportPortal is an open-source centralized report to provide you more insights into historical results along with time spent for execution.

There are still many other reasons for a flaky test, which I won’t mention more details about here. Refer to Google testing blogs that I’ve posted above

The CI/CD solutions

As part of the testing infrastructure, selecting a CI/CD tool is also very important here.

It’s never been easier because Jenkins is the most popular one, but I’d recommend TeamCity instead. Its visualize pipeline report is much easier to detect flaky tests from your continuous testing scripts, and of course, the minimalist UI catches your eyes more refined.

Selecting the repository holder for the team’s deliverables is also part of the infrastructure. Tools like Github Action , Gitlab Runner, Bitbucket add values to this properly.

The pipelines

When you consider integrating tests into the whole development pipeline, it’s another factor.

The whole pipeline refers to both development, testing, release, and post-release pipelines. It’s a sum of pipelines that guide how testing infrastructure is being built rather than just separate components without any linkage.

Post-release monitoring

Constantly monitor what happens post-release to find out interesting things and also keep track of user journeys. There are many statistics data for a user session, and the journey they go in confine the things that infrastructure needs to be concluded for defects detection and further advances.

No matter what technology is chosen for monitoring, e.g., Grafana, ELK, Woopra, the stability in the record what matters for the user the most to keep us on track with our testing priority.

Application Under Test

An application under test(AUT) is another part of the testing infrastructure. If you work in a Scrum project, the iteration changes always happen, and also many different versions (release candidate, beta, official)/environments(local, staging, production) are presented there. Especially for mobile applications, the application under test confine different challenges and not easy to approach testing.

Versioning

Each release has its own version, and its internal delivery for testing also has the version. Each version might differ from UI, workflow, user scenarios. To truly adapt with the rights being a part of the infrastructure, you have to manage it directly from the automated testing framework.

The testing framework should be designed from the beginning with scale and extension in the whole testing mind. Yet this is not easy for non-technical testers, but really you need to think about it. Some beginner and advanced articles for design patterns I’ve read will be useful for you:

My advice is always to think of upcoming development infrastructure as a big picture, not a short-term solution. One day the developers think we need to release an internal version for our staff to try first, and then it’s another break in the development pipeline and affects your testing infrastructure pipeline as well.

Environment

The test environment is in pair with versioning. There will always be typical staging and production environments. In more mature projects, there will be a local one, a QA one.

Well, again, to deal with this is also a part of the design pattern you need to think of. For me, one easy way to use a pre-done supporting library such as Spring to switch to different environment properties quickly and effortlessly.

Deployments

Nowadays, microservices is a star in the sky of deployment for the web application. For mobile applications, it’s the utilization of third-party deployments such as TestFlight for iOS or App Center for Android. The deployment of this depends is not mentioned here, but rather than after deployment, how should it blend with the infrastructure?

Every deployment tied its successful deployments into a specific location. For the web application, with the help of Docker, Kubernetes then the deployment location can be a temporary URL to access at the current code changes. For mobile applications, the app distribution center will distribute the usable application file to be installed on the devices. It’s not a matter of how you select the solution for deployments, but the output of code changes can be compiled successfully and then being a snapshot for the testing infrastructure to grab and execute

With these matters, expose the output of access points of these deployments into environment variables and be used in the testing framework. The testing framework must have the ability to parameterize the environment configurations based on output environment variables so that testing will be triggered on that correct deployment.

The browsers or devices to be tested.

Another thing is the place you will use to execute the test scripts. It will be either specific browsers, devices, or both. I won’t mention the fragmentation or how to gain which places you should execute the test, but rather the supporting infrastructure to pick what kind of browsers/devices to be executed.

For this to be clearly precise, the testing framework should expose the place to configure a single or bunch of configs like e.g, JSON or YAML format so that the team just need to input their specific desired capabilities to test on.

Other types of testing

You don’t only do one specific testing in the whole infrastructure. There will be more testing, including functional and non-functional testing included. The vast of it is huge depends on the testing needs and strategy given from the final outcome of the testing meeting, but those pinpoint the needs for infrastructure to work on it.

Other types of testing, if feasible, should be included in the testing framework with multi-modules project support. The pipeline and the report will pick the results given with specific kinds of testing for visibility.

Now what

The post is quite long that has come to an end already. I always mind that testing is complicated. It’s not easy to achieve testing efficiency for the whole product and expose visibility, encourage collaboration. It’s more difficult when the product is complex, and many different tests need to be performed.

Build up a testing infrastructure is just like building your house step by step. When creating the house, you will likely meet budget issues, conflicts with your family members, decorations being changed over time due to your estimation about furniture are incorrect. Your dream house struggles quite a long time to be a place you are proud of the same for testing infrastructure construction. You can’t ignore building small things first to achieve deliverables for the teams seeing testing efforts, and it’s a step up to further wrap many components of infrastructure.

Nevertheless, the infrastructure has to be measured and maintained to let it not over-react to disrupted changes in development cycles. I hope this blog doesn’t give out too many theories to highlight the needs of testing affects majorly in the

Some (More) ways of mobile testing

March 18, 2021March 18, 2021 by vinh.nguyen, posted in Automation, Chit&Chat, Manual, Mobile

Mobile application testing today required a fully sophisticated approaches to shape the desired qualities, ideally the deliverables judgment of qualities depend on test test plan and agreements of the whole team. Unlike web application, mobile application testing put up many challenges that troubleshoot us for a long term:

Devices diversity: A combination of lots of devices fragment in term of models, hardware and OS
External device factors:
- Network
- Interruption
- Temperature
Releases: You can roll back , apply quick patches easily on web due to evolved deployments dedicated for web infrastructure, but not easy for devices. We can’t force users to update the app on the device due to mobile OS.
OS flavor upgrades: An application can be broken after the device’s OS is upgraded, and it’s harder to validate than web application
Release Approval: A new version of mobile application need to go through Google Play/ Apple review processes, while mobile application doesn’t.

Combined all of above things lead to testing variations for a specific application need to be applied for smooth release. Remember defects prevention is better than fixing them to reduce the efforts and chaos upon the releases

The better information

The most important of catching root cause issues laid in application/device/custom logs generated from the session. Check and add into the framework the capabilities to collect logs for analytics.

The better strategy

Measure and determine the release’s status on mobile testing application is not laid to functionalities anymore. Only functionalities testing can help aid in determine how the application is performed exactly what we want, not what users want and even how is it performed on different devices. Rather than, inject different kinds of testing ways to the application

Crash Testing

Crash is obviously the most annoying issue. There are many reasons crash can be occurred and quick way to check it is using chaos method applied by randomized user inputs

Android: Inject monkey runner command
iOS: Inject randomized user inputs using SwiftMonkey
Gather crash metrics and analytics with the help of analytics reporting tools

How much of crash rates is enough can be referred in this article. Usually this check is harnessed post-commit and track crash rates along with crash logs

Visual Testing

The more and more devices fragmentation you are going to support, the more difficulties to cover the product look and feel. A normal test script that verify every details will be very exhausting. Visual testing reduce the efforts spent for to adapt smart verification and improve the awareness which the team target on the devices.

Beta Testing

Roll out the RC version to a fixed % of end-users who will help to verify the new features. You still can’t cover all specific device settings and its root cause at the same but this hook greatly improve user experiences. Some platforms help to deploy these are TestFlight or HockeyApp

Performance Testing

This testing is not making performance validations on the server infrastructures sideloaded with the application. It’s about how well the application reacts to the device under test under specific conditions. This can be quite complex, because mobile application performance conclude many different factors:

Network
CPU Usage %
Memory Usage %
Backend:
- Data transactions performed in the memory
- Database queries
Render:
- Time until application interaction is allowed
- Blank/Dark screen availability
Storage usage

For easy analyze on these kind of factors, personally I recommend to use Apptim which I’ve had an article previously.

Post-release testing

Observations and monitoring post-release is also a critical way to benchmark the release. You probably will want to look at end-user reviews, how the application’s version is being used over time, is the latest version increase the user’s retention with your intuition fixes? Factors count toward the success of the release, and it’s also an indicator for testing to be revamped with more ways of end-users using the release

For this to be success, you need to monitor:

Reviews on Google Play
Reviews on Apple Store

Even fewer reviews are on these platforms, the internal tracking of your application will help to determine if the users drop the application or not

HƯỚNG DẪN CÁCH AUTOMATE OTP CODE TRÊN MOBILE

February 7, 2021February 8, 2021 by hnminh, posted in Automation, Mobile

Hey yo! Nay cuối năm rảnh rỗi nên mình lại viết thêm 1 bài viết dành cho các bạn nào đang làm automation test với mobile.

Câu chuyện của team mình

Câu chuyện bắt đầu bằng việc team mình có nhu cầu phải làm việc với OTP code khá nhiều, bạn nào đang thực hiện việc testing cho những ứng dụng trên nền tảng mobile chắc đều biết rằng xu hướng hiện nay là bảo mật càng nhiều càng tốt, và kéo theo việc ứng dụng nào cũng tích hợp OTP code để thực hiện bảo mật 2 lớp :(((. Đây chính là nỗi khổ của những bạn nào phải làm automate với những ứng dụng có OTP code. Đối với bạn nào dùng human testing (hay hiểu là manual testing) thì việc test những ứng dụng có OTP code không phải là vấn đề gì to lớn ngoài trừ việc đợi nhận được OTP code từ SMS

Nhưng mà với những bạn làm automate thì đó là 1 bài toán cũng tương đối lằng nhằng rườm rà để giải quyết. Chẳng hạn khi làm automate thì các bạn phải làm sao để nhận biết được SMS đã được send tới điện thoại, rồi làm sao để extract được cái OTP code từ 1 cái SMS, rồi chưa kể vài thằng ứng dụng nó chơi trò send 2-3 SMS cùng 1 lúc nhưng SMS chứa OTP code nó chỉ nằm ở trong số 3 SMS mà điện thoại đã nhận. Rồi chưa kể quá trình extract OTP code thì với những bạn cần phải test ứng dụng trên những Android/iOS version cũ thì nó chưa hỗ trợ việc tự lấy OTP từ bàn phím, hiểu nôm na là bạn phải viết thêm 1 nùi automation steps để tương tác với GUI nhằm getText cái SMS string (đậu phộng automation GUI E2E đã khó, giờ còn phải tạo mớ steps cho bước này nữa thì chắc flaky test nó tăng ào ào quá)

Nhu cầu của team mình về bài toán OTP code

Trong câu chuyện trên thì nhu cầu của team mình nó cũng đơn giản như mớ thông tin dưới đây:

Cần tìm 1 giải pháp để lấy OTP 1 cách lẹ nhất cũng như chính xác nhất mà không thông qua việc tương tác với GUI của mobile application
Có thể dễ dàng tìm và filter SMS theo số điện thoại hoạc sender address gởi tới
Chạy càng lẹ càng tốt (chắc chạy lẹ cỡ tốc độ bàn thờ của các bạn racing boy =]])
Chạy phải ổn định cũng như dễ dàng debug sau này
Cho phép chia sẻ điện thoại nhận SMS với những bộ test khác, hiểu nôm na là những điện thoại run test sẽ không cắm sim mà sẽ share với nhau sim từ 1 device duy nhất, device này hiểu nôm na là device chỉ dành cho mục đích nhận SMS cho những devices run test
Tính bảo mật tốt do việc thực hiện lấy thông tin OTP code mà lộ cơ chế lấy hoặc lữu trữ nhưng thông tin này ra ngoài thì có nguy cơ bị hacker lợi dụng để làm trò bậy bạ gì đó =)))

Sau khi ngồi thao luận và list ra được mớ yêu cầu trên thì cái mặt của mình nó y chang như vậy

Mình ngồi nghĩ trong đầu “yêu cầu gì mà yêu cầu lắm yêu cầu lốn”, nhưng mà cũng phải tìm giải pháp thôi vì những cái yêu cầu đó nó hợp lý quá rồi :((((((

Sự ra đời của sms-listener-service plugin

Sau nhiều đêm trằn trọc thao thức, suy nghĩ những cách để giải quyết mớ yêu cầu trên. Tốn hết mấy chục ly cafe cũng như tiền net, mình đã quyết định sẽ implement 1 cái plugin (hiểu nôm na là 1 native application chạy trên device) nhằm mục đích có thể call Restful API từ plugin đó để lấy được SMS từ device. Hiểu nôm na thì cơ chế nó sẽ giống như 2 cái hình dưới đây:

**High level sms-listener-service overview**

**SMS Listener Service Application Overview**

Như mình mô tả ở 2 hình trên thì mình sẽ thực hiện việc viết 1 ứng dụng (plugin version hiện tại thì mình chỉ mới viết ứng dụng đó trên nền tản Android), trong đó nó sẽ chạy 1 http server (backend) bên trong chính điện thoại dưới dạng 1 ~~background service~~ foreground service. Server trên sẽ expose 3 APIs dạng Restful để cho users có thể thực hiện việc truy xuất SMS đang có trên devices. Chắc lúc này sẽ có vài bạn thắc mắc vì sao mình đi với lựa chọn này =)) Thì dưới đây là những lý do mình đi với lựa chọn này:

Mô hình dựng 1 backend bên trong device thì mình cũng dựa vào cơ chế của Appium driver, và trong tương lai với Appium 2.0 thì cơ chế này bên Appium sẽ gọi là custom driver (đọc thêm bài này nếu các bạn quan tâm về Appium 2.0)
Việc chạy 1 backend bên trong mobile chỉ có nhiệm vụ nhận request từ Restful APIs rồi trả về data SMS cho users sẽ giúp cho việc lấy thông tin code OTP trở nên lẹ và dễ dàng hơn, mình sẽ không cần thực hiện những bước liên quan tới GUI chỉ để lấy được SMS text
Mình chọn netty server vì hiện tại có tìm hiểu những thư viện serverless nhúng vô mobile thì không có nhiều sự lựa chọn lắm ngoài 2 thằng (có thể mình tìm bị sót nên nếu ai biết thư viện nào nữa thì comment share mình nhé) nanohttpd và netty.io. Lý do mình chọn netty.io cũng là do hiện tại Appium driver agent trên Android cũng đang sử dụng thư viện này, tội gì không sử dụng 1 thư viện đã được Appium kiểm tra và sử dụng chính thức =))
Lý do mình chọn implement plugin application này dưới dạng Android service (~~background service~~ foreground service) là do việc khi sử dụng cho việc automate mobile application thì mình application được automate sẽ chạy ở main process của mobile deivce, dẫn tới việc nếu không chạy plugin application dưới dạng ~~background service~~ foreground service (ban đầu mình làm background service tuy nhiên hiện tại ở những Android version mới thì nó sẽ bị tình trạng điện thoại rơi vào Doze/Standby mode thì service mình sẽ bị kill vì thế mình đã chuyển sang foreground service) thì http server sẽ bị tắt, và kéo theo việc mình không thể nào truy cập được SMS của device được nữa
Tới đây mà bạn nào từng làm làm việc nhiều với ứng dụng Android sẽ thắc mắc là tại sao mình lại không implement plugin của mình dưới dạng “Incomming SMS Broadcast Receiver”. Khi đó mình sẽ không cần duy trì 1 http server bên trong mobile device. Thiệt ra thì cách này cũng được, chỉ khác là lúc này mình lại phải implement 1 cái http server ở ngoài mobile deivce chỉ để nó nhận và lưu trữ SMS được forward từ mobile device ra. Hiểu nôm na là cái application plugin mỗi lần có new SMS được send tới điện thoại thì nó sẽ tự động call 1 API từ cái server mình dựng từ bên ngoài để lưu trữ thông tin SMS đó ra ngoài (cơ chế nó sẽ tương tự ứng dụng này mysms). Về cơ chế thì nó tương tự nhau, nhưng cách này thì sẽ mất thời gian implement thêm và dựng tiếp 1 cái http server ở ngoài, bên cạnh đó cũng phải làm thêm cơ chế lưu trữ SMS được forwad từ điện thoại qua server đó. Dẫn tới việc phải nghĩ tới vấn đề bảo mật cho những thông tin OTP đó. Trong khi với việc chạy 1 http server bên trong điện thoại thì khi nào users có nhu cầu muốn lấy thông tin SMS thì plugin application chỉ cần đọc thông tin SMS của device và trả về luôn, việc lưu trữ những thông tin SMS sẽ không cần thiết nữa. Bên cạnh đó mình cũng muốn loại bỏ luôn yếu tố việc sync SMS từ điện thoại qua external backend bị lỗi dẫn tới có thể điện thoại nhận được SMS nhưng quá trình sync SMS từ device qua hệ thống backend external bị lỗi.

Sử dụng sms-listener-service plugin

Nãy giờ mình nói dông dài rồi, giờ thì mình vô tới phần hướng dẫn sử dụng cái application này. Nói chung việc sử dụng nó cũng dễ dàng chứ không phức tạp gì mấy.

1. Tải plugin android application từ link này

2. Kết nối điện thoại Android với máy tính của bạn và nhớ bật đầy đủ các mode hỗ trợ việc sử dụng adb command nha các anh em. Bạn nào chưa biết làm những bước gì thì có thể coi bài viết này

3. Kiểm tra thử android device máy tính đã kết nối được chưa thông qua lệnh sau

adb devices

**Kiểm tra máy tính đã kết nối được với device chưa**

4. Tiến hành cài đặt plugin application vô điện thoại thông qua lệnh sau

adb install -g sms-listener.apk

Ở đây các bạn nhớ phải có argument “-g” nhé. Do application sẽ đòi hỏi những permission cho việc đọc và truy cập SMS messages. Nếu các bạn không install với argument “-g” thì ứng dụng sẽ không thể chạy được

**Cài đặt plugin application vô device**

5. Start application từ adb command, các bạn cũng có thể start từ việc open nó trên GUI app, nhưng ở đây mình chọn cách hướng dẫn việc start application từ adb command để các bạn có thể dễ dàng kết hợp với automation scripts của mình 1 cách dễ dàng

adb shell am start -n "com.toilatester.smslistener/com.toilatester.sms.listener.MainActivity" --ei serverPort 8185

Ở đây argument “–ei” cho phép bạn cấu hình server port sẽ chạy bên trong device. Mục đích của việc này là cho phép bạn có thể sử dụng nhiều devices cùng 1 lúc trên 1 máy tính. Do để có thể truy cập vô được APIs của plugin http server trên device thì các bạn phải truy cập qua 2 cách.

Dùng lệnh [adb forward tcp:8181 tcp:8185] để forward request từ máy đang kết nối device vô bên trong http server của device. Như lệnh ở trên có nghĩa là forward request được gọi tới địa chỉ localhost của máy tính ở port 8181 vô server đang được start bên trong device ở port 8185
Truy cập APIs thông qua IP của device đang chạy. Ví dụ: khi bạn đang connect device với wifi thì lúc này device sẽ có 1 địa chỉ IP, khi đó bạn chỉ cầng gọi trực tiếp APIs thông qua URL sau: http://địa_chỉ_ip_của_device:port_đang_start_http_server/{api_enpoints}

Start plugin application

Vậy là quá trình hướng dẫn cũng đã xong, giờ thì các bạn tận hưởng thành quả thôi

Demo plugin application khi sử dụng

Những câu hỏi liên quan tới quá trình plugin chạy

Liệu plugin có lén lấy trộm thông tin SMS của người dùng không

Trả lời: Không.

Giải thích: Đó là lý do mình không chọn cách implement với “Incomming SMS Broadcast Receiver”, vì khi đó mình cần xây dựng 1 hệ thống bên ngoài để lưu trữ những thông tin SMS để sau đó users có thể truy xuất thông tin SMS từ device thông qua Restful APIs. Do plugin của mình là 1 http server nó chỉ có nhiệm vụ nhận request từ users, truy cập thông tin SMS và trả về dưới dạng JSON data cho nên nó không hề lưu thông tin đó ở bất kỳ chỗ nào khác được. Các bạn có thể dễ dàng kiểm chứng thông qua source code =)) vì mình cũng open source cái plugin này

Việc chạy ~~background service~~ foreground service thì có bị tình trạng plugin application bị kill khi đang chạy

Trả lời: Có và Không

Giải thích: Hiện tại mình start ~~background service~~ foreground service với mode “START_STICKY”. Theo như document mình đọc thì với mode này thì khi điện thoại đang bi low memory nó sẽ có thể tự đông kill bớt 1 vài ứng dụng có mức độ ưu tiên không cao. Khi đó với mode “START_STICKY” thì ứng dụng sẽ tự động chạy lại service sms-listener 1 cách tự động. Hiện tại thì application của mình quá trình chạy sẽ không tốn quá nhiều memory của device, lý do là chỉ nhận request và trả về data liền chứ không hề lưu hay phát sinh thêm dữ liệu, từ đó khả năng dẫn tới việc gây tiêu tốn quá nhiều memory của device làm cho device tự động kill serivce là không có. Tuy nhiên có chỗ này lưu ý, nếu bạn tắt hẳn ứng dụng thì đó lại là việc khác, nó tương tự việc bạn sử dụng ứng dụng nghe nhạc, nó chạy ngầm ở hệ thống, khi nhấn vào nút recent app trên các điện thoại android và vút để tắt nó đi thì đồng nghĩa bạn đã force tắt service, khi đó thì mình cũng chịu thua do bạn đã cố ý tắt ứng dụng rồi. Nhưng dù sao thì mình cũng sẽ test thêm vấn đề này, tuy nhiên nếu phục vụ mục đích automate thì mình nghĩ nó không phải là vấn đề lớn, mỗi khi test start bạn cứ execute command start plugin service để đảm bảo plugin chạy là coi như giảm thiểu khả năng plugin không chạy rồi. Mình đã có test thử trên device android Samsung galaxy S7 Edge của mình thì service sẽ luôn chạy kể cả khi bạn nhấn button recent và close application. Lý do ở trên mình có đề cập việc nhấn recent và close ứng dụng có thể bị tắt service là do thông tin từ bài viết này (bạn nào xài thử mà có bị thì có thể báo lại mình nha)

Khi nào sẽ có plugin này trên iOS

Trả lời: Chưa biết nữa tùy theo mức độ rãnh rang của mình =))

Hướng dẫn sử dụng plugin InfluxDB và LOKI với jmeter

January 24, 2021January 24, 2021 by hnminh, posted in Load testing, Performance

JMeter Real-Time Monitoring, Integration With Grafana+InfluxDB 2.0 (Flux) - DZone Integration

Bạn nào từng làm performance test (hay còn gọi là load test, mặc dù mình ko thích cách gọi này lắm do load test chỉ là 1 phần của performance test =]]) thì thường sẽ gặp phải những vấn đề như: làm sao để có thể theo dõi kết quả chạy load test 1 cách trực tiếp? Làm sao để lưu trữ dữ liệu load test và so sánh nó với những thông số khác theo trục thời gian (HTML report thì bạn rất khó để xác định được chính xác mốc thời gian với điểm nghi ngờ bị performance và đi kiểm tra với các thông số của hệ thống). Hay thậm chí là so sánh kết quả thực hiện load test hiện tại với các mốc thời điểm 1-2-3-4-…. tháng trước? Hay thậm chí là cách nào đó để lưu trữ những dữ liệu của response data khi thực hiện load test 1 cách hiệu quả?

Từ những điều khó khăn trên mà mình gặp phải dẫn tới việc nhóm mình trong lúc rảnh rỗi đã viết 1 plugin trên JMeter để giải quyết vấn đề trên.

Hướng dẫn cài đặt

Việc đầu tiên là các bạn cần vô link này nhấn like và follow (mình đùa thôi nhưng nhớ làm cũng được :D). Các bạn vô link này download plugin về (cái file để download như hình dưới đây)

Sau khi download xong file đó thì các bạn giải nén nó ra và copy file “jmeter-backend-listener-plugin.jar”

vô thư mục lib/ext (thư mục này nằm trong folder các bạn đã cài đặt JMeter)

Như vậy thì việc cài đặt plugin đã hoàn toàn xong =)) nói chung nó dễ đến mức không còn có thể dễ hơn được nữa

Hướng dẫn cấu hình plugins với JMeter

Sau khi đã cài đặt plugin xong thì đến bước cấu hình để sử dụng plugin, việc cấu hình plugin cũng rất là đơn giản như những bước dưới đây

Đầu tiên là các bạn cần phải mở JMeter lên rồi, =)) không mở lên thì lấy cái gì mà cấu hình hay sử dụng đúng ko nào =)). Sau đó các bạn tạo 1 thread-group bất kỳ. Sau 2 bước trên các bạn sẽ có 1 structure của JMeter như hình sau

Kế tiếp các bạn tạo tiếp 1 “backend listener” element trong JMeter theo như hình sau

Sau khi add xong thì các bạn sẽ có structure của JMeter test plan như hình sau

Giờ thì đến bước cấu hình cho listener plugin, để cấu hình cho plugin thì bạn click vô “Backend listener” element và chọn 1 trong 2 options sau (mỗi option tương ứng cho việc bạn sẽ send JMeter metrics hoặc JMeter response data vô InfluxDB hoặc Loki)

Sau khi chọn xong loại backend listener thì các bạn cấu hình thông số cho nó

config-influxdb-listener — InfluxDB Backend Listener Configuration

config-loki-listener — Loki Backend Listener Configuration

Để hiểu rõ từng options trong configuration thì các bạn có thể coi thông tin chi tiết ở link sau

Lưu ý: Nếu bạn muốn dùng cả InfluxDB và Loki thì bạn chỉ cần add 2 “backend listener” vô JMeter test plan, nhưng lưu ý là nhớ đổi tên cả 2 element, do cơ chế của JMeter là sẽ broad cast cái sampler result vô từng listener theo element name của chính nó, nên khi bạn add 2 element thì nó sẽ bị trùng tên (default name của backend listener)

Kế tiếp thì các bạn sẽ cần làm thêm 1 bước là import dashboard Grafana mà bọn mình đã xây dựng sẵn cho plugin của bọn mình. Các bạn download template ở link sau. Kế tiếp là import nó vô Grafana (bạn nào chưa biết cách import Grafana dashboard thì Google dùm mình nha =)) mình assume là ở đây mọi người đều có mạng internet và có thể truy cập được Google để tìm kiếm thông tin). Bên cạnh đó các bạn cũng cần phải cài đặt InfluxDB, Loki và Grafana (các bạn có thể coi cách cài đặt những cái đó ở link mình đính kém theo cái tên của nó)

Sau khi xong hết những cái đó thì chạy thử JMeter test và tận hưởng thành quả hoy

Những câu hỏi thường gặp khi sử dụng plugin

1. Sử dụng plugin này có làm ảnh hưởng tới kết quả performance test không?

Trả lời: Plugin sẽ không ảnh hưởng tới kết quả performance test

=> Giải thích: Cơ chế của backend listener của JMeter là khi JMeter thực hiện 1 request lên SUT/AUT (system under test/application under test) thì kết quả của JMeter request đó sẽ được truyền vô plugin listener ở 1 thread riêng. Hiểu nôm na thì JMeter sẽ có 2 thread riêng khi bạn sử dụng plugin này, 1 thread sẽ chỉ chuyên làm nhiệm vụ thực hiện việc gởi những request lên hệ thống bạn test, và 1 thread làm nhiệm vụ gởi kết quả lên InfluxDB hay Loki. Với cơ chế này thì nếu bạn đang thực hiện performance test mà InfluxDB hay Loki có bị chết thì việc thực hiện load test vẫn sẽ tiếp tục diễn ra chứ không bị gián đoạn, ngoài ra nếu việc send dữ liệu lên hệ thống collect metrics bị chậm thì cũng sẽ ko ảnh hưởng tới kết quả performance test.

2. Sử dụng plugin rồi nhưng có tính trạng bị treo máy Master khi run performance test ở chế độ distributed?

Trả lời: Plugin không có lỗi, lỗi là do cơ chế nhận sampler result ở JMeter khi run ở mode distributed =))

=> Giải thích: khi các bạn run performance test với JMeter ở chế độ “Distributed” (hay còn gọi là kiểu master và slave, việt hóa là ông chủ và những nô lệ =]]). Thì cơ chế của JMeter máy master sẽ làm nhiệm vụ truyền lệnh run cũng như send *.jmx file (hay còn gọi kịch bản test) xuống những máy slave. Việc thực hiện gởi những request lên hệ thống test sẽ do những máy slave làm, sau đó thì kết quả sẽ được gởi về lại máy master. Và thường vấn đề gây treo máy nó nằm ở đây. Do khi thực hiện cơ chế này nhiều bạn mắc 1 sai lầm là physical machine (hiểu nôm na là máy dùng để chạy JMeter) đang run Master cũng là máy sẽ run Slave, khi đó physical machine đó sẽ bị quá tải ở cả Memory/CPU/Network và dẫn tới treo máy. Để hạn chế tình trạng trên thì tốt nhất mỗi máy master và slave các bạn nên run nó trên những physical machine riêng biệt. Ngoài ra thì bạn có thể vô file jmeter.properties (nằm trong folder bin) để cấu hình cơ chế mà những máy slave sẽ send kết quả về cho master như hình dưới đây:

3. Sử dụng plugin nhưng khi thực hiện query hoặc load kết quả lên Granafa rất chậm?

Trả lời: Cuộc sống thì cái này nằm ngoài phạm vi của plugin rồi

=> Giải thích: Do cơ chế lưu trữ của InfluxDB như cơ chế chia table/shard (nó tương tự cơ chế index ở bên mấy database dạng SQL) sẽ dẫn đến việc khi bạn load dữ liệu lên Grafana nó sẽ mất thời gian để query những thông số theo nhu cầu, ngoài ra thì việc sử dụng regex nhiều quá cũng ảnh hưởng tới thời gian Grafana lấy được dữ liệu từ InfluxDB