Skip to main content

Home/ Larvata/ Group items tagged parallel

Rss Feed Group items tagged

crazylion lee

Java Streams, Part 4: From concurrent to parallel - 0 views

  •  
    "Understanding the factors influencing parallel performance"
張 旭

Choose when to run jobs | GitLab - 0 views

  • Rules are evaluated in order until the first match.
  • no rules match, so the job is not added to any other pipeline.
  • define a set of rules to exclude jobs in a few cases, but run them in all other cases
  • ...32 more annotations...
  • use all rules keywords, like if, changes, and exists, in the same rule. The rule evaluates to true only when all included keywords evaluate to true.
  • use parentheses with && and || to build more complicated variable expressions.
  • Use workflow to specify which types of pipelines can run.
  • every push to an open merge request’s source branch causes duplicated pipelines.
  • avoid duplicate pipelines by changing the job rules to avoid either push (branch) pipelines or merge request pipelines.
  • do not mix only/except jobs with rules jobs in the same pipeline.
  • For behavior similar to the only/except keywords, you can check the value of the $CI_PIPELINE_SOURCE variable
  • commonly used variables for if clauses
  • rules:changes expressions to determine when to add jobs to a pipeline
  • Use !reference tags to reuse rules in different jobs.
  • Use except to define when a job does not run.
  • only or except used without refs is the same as only:refs / except/refs
  • If you change multiple files, but only one file ends in .md, the build job is still skipped.
  • If you use multiple keywords with only or except, the keywords are evaluated as a single conjoined expression.
  • only includes the job if all of the keys have at least one condition that matches.
  • except excludes the job if any of the keys have at least one condition that matches.
  • With only, individual keys are logically joined by an AND
  • With except, individual keys are logically joined by an OR
  • To specify a job as manual, add when: manual to the job in the .gitlab-ci.yml file.
  • Use protected environments to define a list of users authorized to run a manual job.
  • Use when: delayed to execute scripts after a waiting period, or if you want to avoid jobs immediately entering the pending state.
  • To split a large job into multiple smaller jobs that run in parallel, use the parallel keyword
  • run a trigger job multiple times in parallel in a single pipeline, but with different variable values for each instance of the job.
  • The @ symbol denotes the beginning of a ref’s repository path. To match a ref name that contains the @ character in a regular expression, you must use the hex character code match \x40.
  • Compare a variable to a string
  • Check if a variable is undefined
  • Check if a variable is empty
  • Check if a variable exists
  • Check if a variable is empty
  • Matches are found when using =~.
  • Matches are not found when using !~.
  • Join variable expressions together with && or ||
  •  
    "Rules are evaluated in order until the first match."
crazylion lee

baidu/Paddle: PArallel Distributed Deep LEarning - 0 views

  •  
    "PArallel Distributed Deep LEarning http://www.paddlepaddle.org/"
crazylion lee

anthonynsimon/bild: A collection of parallel image processing algorithms in pure Go - 0 views

  •  
    "A collection of parallel image processing algorithms in pure Go"
crazylion lee

hirak/prestissimo: composer parallel install plugin - 0 views

  •  
    "composer parallel install plugin"
張 旭

Using Workflows to Schedule Jobs - CircleCI - 1 views

  • A workflow is a set of rules for defining a collection of jobs and their run order.
  • Schedule workflows for jobs that should only run periodically.
  • run multiple jobs in parallel
  • ...37 more annotations...
  • rerun just the failed job
  • Builds without workflows require a build job.
  • Refer the YAML Anchors/Aliases documentation for information about how to alias and reuse syntax to keep your .circleci/config.yml file small.
  • workflow orchestration with two parallel jobs
  • jobs run according to configured requirements, each job waiting to start until the required job finishes successfully
  • requires: key
  • fans-out to run a set of acceptance test jobs in parallel, and finally fans-in to run a common deploy job.
  • Holding a Workflow for a Manual Approval
  • Workflows can be configured to wait for manual approval of a job before continuing to the next job
  • add a job to the jobs list with the key type: approval
  • approval is a special job type that is only available to jobs under the workflow key
  • The name of the job to hold is arbitrary - it could be wait or pause, for example, as long as the job has a type: approval key in it.
  • schedule a workflow to run at a certain time for specific branches.
  • The triggers key is only added under your workflows key
  • using cron syntax to represent Coordinated Universal Time (UTC) for specified branches.
  • By default, a workflow is triggered on every git push
  • the commit workflow has no triggers key and will run on every git push
  • The nightly workflow has a triggers key and will run on the specified schedule
  • Cron step syntax (for example, */1, */20) is not supported.
  • use a context to share environment variables
  • use the same shared environment variables when initiated by a user who is part of the organization.
  • CircleCI does not run workflows for tags unless you explicitly specify tag filters.
  • CircleCI branch and tag filters support the Java variant of regex pattern matching.
  • Each workflow has an associated workspace which can be used to transfer files to downstream jobs as the workflow progresses.
  • The workspace is an additive-only store of data.
  • Jobs can persist data to the workspace
  • Downstream jobs can attach the workspace to their container filesystem.
  • Attaching the workspace downloads and unpacks each layer based on the ordering of the upstream jobs in the workflow graph.
  • Workflows that include jobs running on multiple branches may require data to be shared using workspaces
  • To persist data from a job and make it available to other jobs, configure the job to use the persist_to_workspace key.
  • Files and directories named in the paths: property of persist_to_workspace will be uploaded to the workflow’s temporary workspace relative to the directory specified with the root key.
  • Configure a job to get saved data by configuring the attach_workspace key.
  • persist_to_workspace
  • attach_workspace
  • To rerun only a workflow’s failed jobs, click the Workflows icon in the app and select a workflow to see the status of each job, then click the Rerun button and select Rerun from failed.
  • if you do not see your workflows triggering, a configuration error is preventing the workflow from starting.
  • check your Workflows page of the CircleCI app (not the Job page)
  •  
    "A workflow is a set of rules for defining a collection of jobs and their run order."
張 旭

Understanding GitHub Actions - GitHub Docs - 0 views

  • A job is a set of steps that execute on the same runner. By default, a workflow with multiple jobs will run those jobs in parallel.
  • Workflows are made up of one or more jobs and can be scheduled or triggered by an event
  • An event is a specific activity that triggers a workflow.
  • ...8 more annotations...
  • configure a workflow to run jobs sequentially.
  • A step is an individual task that can run commands in a job. A step can be either an action or a shell command.
  • Each step in a job executes on the same runner, allowing the actions in that job to share data with each other.
  • Actions are standalone commands that are combined into steps to create a job.
  • Actions are the smallest portable building block of a workflow.
  • To use an action in a workflow, you must include it as a step.
  • You can use a runner hosted by GitHub, or you can host your own.
  • GitHub-hosted runners are based on Ubuntu Linux, Microsoft Windows, and macOS, and each job in a workflow runs in a fresh virtual environment.
  •  
    "A job is a set of steps that execute on the same runner. By default, a workflow with multiple jobs will run those jobs in parallel. "
張 旭

Best Practices · mperham/sidekiq Wiki - 0 views

  • Don't save state to Sidekiq, save simple identifiers.
  • Look up the objects once you actually need them in your perform method.
  • The Sidekiq client API uses JSON.dump to send the data to Redis
  • ...6 more annotations...
  • The Sidekiq server pulls that JSON data from Redis and uses JSON.load to convert the data back into Ruby types to pass to your perform method
  • Idempotency means that your job can safely execute multiple times
  • use a database transaction to ensure data changes are rolled back if there is an error
  • Sidekiq will execute your job at least once.
  • Sidekiq is designed for parallel execution so design your jobs so you can run lots of them in parallel
  • Sidekiq will not provide features which hack around a lack of concurrency in your jobs.
張 旭

How to Benchmark Performance of MySQL & MariaDB Using SysBench | Severalnines - 1 views

  • SysBench is a C binary which uses LUA scripts to execute benchmarks
  • support for parallelization in the LUA scripts, multiple queries can be executed in parallel
  • by default, benchmarks which cover most of the cases - OLTP workloads, read-only or read-write, primary key lookups and primary key updates.
  • ...21 more annotations...
  • SysBench is not a tool which you can use to tune configurations of your MySQL servers (unless you prepared LUA scripts with custom workload or your workload happen to be very similar to the benchmark workloads that SysBench comes with)
  • it is great for is to compare performance of different hardware.
  • Every new server acquired should go through a warm-up period during which you will stress it to pinpoint potential hardware defects
  • by executing OLTP workload which overloads the server, or you can also use dedicated benchmarks for CPU, disk and memory.
  • bulk_insert.lua. This test can be used to benchmark the ability of MySQL to perform multi-row inserts.
  • All oltp_* scripts share a common table structure. First two of them (oltp_delete.lua and oltp_insert.lua) execute single DELETE and INSERT statements.
  • oltp_point_select, oltp_update_index and oltp_update_non_index. These will execute a subset of queries - primary key-based selects, index-based updates and non-index-based updates.
  • you can run different workload patterns using the same benchmark.
  • Warmup helps to identify “regular” throughput by executing benchmark for a predefined time, allowing to warm up the cache, buffer pools etc.
  • By default SysBench will attempt to execute queries as fast as possible. To simulate slower traffic this option may be used. You can define here how many transactions should be executed per second.
  • SysBench gives you ability to generate different types of data distribution.
  • decide if SysBench should use prepared statements (as long as they are available in the given datastore - for MySQL it means PS will be enabled by default) or not.
  • sysbench ./sysbench/src/lua/oltp_read_write.lua  help
  • By default, SysBench will attempt to execute queries in explicit transaction. This way the dataset will stay consistent and not affected: SysBench will, for example, execute INSERT and DELETE on the same row, making sure the data set will not grow (impacting your ability to reproduce results).
  • specify error codes from MySQL which SysBench should ignore (and not kill the connection).
  • the two most popular benchmarks - OLTP read only and OLTP read/write.
  • 1 million rows will result in ~240 MB of data. Ten tables, 1000 000 rows each equals to 2.4GB
  • by default, SysBench looks for ‘sbtest’ schema which has to exist before you prepare the data set. You may have to create it manually.
  • pass ‘--histogram’ argument to SysBench
  • ~48GB of data (20 tables, 10 000 000 rows each).
  • if you don’t understand why the performance was like it was, you may draw incorrect conclusions out of the benchmarks.
張 旭

Serverless Architectures - 0 views

  • Serverless was first used to describe applications that significantly or fully depend on 3rd party applications / services (‘in the cloud’) to manage server-side logic and state.
  • ‘rich client’ applications (think single page web apps, or mobile apps) that use the vast ecosystem of cloud accessible databases (like Parse, Firebase), authentication services (Auth0, AWS Cognito), etc.
  • ‘(Mobile) Backend as a Service’
  • ...33 more annotations...
  • Serverless can also mean applications where some amount of server-side logic is still written by the application developer but unlike traditional architectures is run in stateless compute containers that are event-triggered, ephemeral (may only last for one invocation), and fully managed by a 3rd party.
  • ‘Functions as a service
  • AWS Lambda is one of the most popular implementations of FaaS at present,
  • A good example is Auth0 - they started initially with BaaS ‘Authentication as a Service’, but with Auth0 Webtask they are entering the FaaS space.
  • a typical ecommerce app
  • a backend data-processing service
  • with zero administration.
  • FaaS offerings do not require coding to a specific framework or library.
  • Horizontal scaling is completely automatic, elastic, and managed by the provider
  • Functions in FaaS are triggered by event types defined by the provider.
  • a FaaS-supported message broker
  • from a deployment-unit point of view FaaS functions are stateless.
  • allowed the client direct access to a subset of our database
  • deleted the authentication logic in the original application and have replaced it with a third party BaaS service
  • The client is in fact well on its way to becoming a Single Page Application.
  • implement a FaaS function that responds to http requests via an API Gateway
  • port the search code from the Pet Store server to the Pet Store Search function
  • replaced a long lived consumer application with a FaaS function that runs within the event driven context
  • server applications - is a key difference when comparing with other modern architectural trends like containers and PaaS
  • the only code that needs to change when moving to FaaS is the ‘main method / startup’ code, in that it is deleted, and likely the specific code that is the top-level message handler (the ‘message listener interface’ implementation), but this might only be a change in method signature
  • With FaaS you need to write the function ahead of time to assume parallelism
  • Most providers also allow functions to be triggered as a response to inbound http requests, typically in some kind of API gateway
  • you should assume that for any given invocation of a function none of the in-process or host state that you create will be available to any subsequent invocation.
  • FaaS functions are either naturally stateless
  • store state across requests or for further input to handle a request.
  • certain classes of long lived task are not suited to FaaS functions without re-architecture
  • if you were writing a low-latency trading application you probably wouldn’t want to use FaaS systems at this time
  • An API Gateway is an HTTP server where routes / endpoints are defined in configuration and each route is associated with a FaaS function.
  • API Gateway will allow mapping from http request parameters to inputs arguments for the FaaS function
  • API Gateways may also perform authentication, input validation, response code mapping, etc.
  • the Serverless Framework makes working with API Gateway + Lambda significantly easier than using the first principles provided by AWS.
  • Apex - a project to ‘Build, deploy, and manage AWS Lambda functions with ease.'
  • 'Serverless' to mean the union of a couple of other ideas - 'Backend as a Service' and 'Functions as a Service'.
張 旭

Orbs, Jobs, Steps, and Workflows - CircleCI - 0 views

  • Orbs are packages of config that you either import by name or configure inline to simplify your config, share, and reuse config within and across projects.
  • Jobs are a collection of Steps.
  • All of the steps in the job are executed in a single unit which consumes a CircleCI container from your plan while it’s running.
  • ...11 more annotations...
  • Workspaces persist data between jobs in a single Workflow.
  • Caching persists data between the same job in different Workflow builds.
  • Artifacts persist data after a Workflow has finished.
  • run using the machine executor which enables reuse of recently used machine executor runs,
  • docker executor which can compose Docker containers to run your tests and any services they require
  • macos executor
  • Steps are a collection of executable commands which are run during a job
  • In addition to the run: key, keys for save_cache:, restore_cache:, deploy:, store_artifacts:, store_test_results: and add_ssh_keys are nested under Steps.
  • checkout: key is required to checkout your code
  • run: enables addition of arbitrary, multi-line shell command scripting
  • orchestrating job runs with parallel, sequential, and manual approval workflows.
張 旭

Makefiles - Best practices and suggestions | MDN - 0 views

  • hardcoded values - avoid them like the plague
  • For classes of hardware (unix/windows) place your makefile in a subdirectory, unix/Makefile.in
  • Initial make call should always be the workhorse: build, generate, deploy, install, etc.
  • ...6 more annotations...
  • All subsequent make calls must become a NOP unless sources or dependencies change or have been removed.
    • 張 旭
       
      No Operation 或 No Operation Performed 的縮寫,意為無操作
    • 張 旭
  • Do not use directories as a dependency for generated targets, ever.
  • Parallel make: add an explicit timestamp dependency (.done) that make can synchronize threaded calls on to avoid a race condition.
  • Maintain clean targets - makefiles should be able to remove all content that is generated so "make clean" will return the sandbox/directory back to a clean state.
  • Wrapper check/unit tests with a ENABLE_TESTS conditional
張 旭

Intro to deployment strategies: blue-green, canary, and more - DEV Community - 0 views

  • using a service-oriented architecture and microservices approach, developers can design a code base to be modular.
  • Modern applications are often distributed and cloud-based
  • different release cycles for different components
  • ...20 more annotations...
  • the abstraction of the infrastructure layer, which is now considered code. Deployment of a new application may require the deployment of new infrastructure code as well.
  • "big bang" deployments update whole or large parts of an application in one fell swoop.
  • Big bang deployments required the business to conduct extensive development and testing before release, often associated with the "waterfall model" of large sequential releases.
  • Rollbacks are often costly, time-consuming, or even impossible.
  • In a rolling deployment, an application’s new version gradually replaces the old one.
  • new and old versions will coexist without affecting functionality or user experience.
  • Each container is modified to download the latest image from the app vendor’s site.
  • two identical production environments work in parallel.
  • Once the testing results are successful, application traffic is routed from blue to green.
  • In a blue-green deployment, both systems use the same persistence layer or database back end.
  • You can use the primary database by blue for write operations and use the secondary by green for read operations.
  • Blue-green deployments rely on traffic routing.
  • long TTL values can delay these changes.
  • The main challenge of canary deployment is to devise a way to route some users to the new application.
  • Using an application logic to unlock new features to specific users and groups.
  • With CD, the CI-built code artifact is packaged and always ready to be deployed in one or more environments.
  • Use Build Automation tools to automate environment builds
  • Use configuration management tools
  • Enable automated rollbacks for deployments
  • An application performance monitoring (APM) tool can help your team monitor critical performance metrics including server response times after deployments.
張 旭

State: Workspaces - Terraform by HashiCorp - 0 views

  • The persistent data stored in the backend belongs to a workspace.
  • Certain backends support multiple named workspaces, allowing multiple states to be associated with a single configuration.
  • Terraform starts with a single workspace named "default". This workspace is special both because it is the default and also because it cannot ever be deleted.
  • ...12 more annotations...
  • Within your Terraform configuration, you may include the name of the current workspace using the ${terraform.workspace} interpolation sequence.
  • changing behavior based on the workspace.
  • Named workspaces allow conveniently switching between multiple instances of a single configuration within its single backend.
  • A common use for multiple workspaces is to create a parallel, distinct copy of a set of infrastructure in order to test a set of changes before modifying the main production infrastructure.
  • Non-default workspaces are often related to feature branches in version control.
  • Workspaces alone are not a suitable tool for system decomposition, because each subsystem should have its own separate configuration and backend, and will thus have its own distinct set of workspaces.
  • In particular, organizations commonly want to create a strong separation between multiple deployments of the same infrastructure serving different development stages (e.g. staging vs. production) or different internal teams.
  • use one or more re-usable modules to represent the common elements, and then represent each instance as a separate configuration that instantiates those common elements in the context of a different backend.
  • If a Terraform state for one configuration is stored in a remote backend that is accessible to other configurations then terraform_remote_state can be used to directly consume its root module outputs from those other configurations.
  • For server addresses, use a provider-specific resource to create a DNS record with a predictable name and then either use that name directly or use the dns provider to retrieve the published addresses in other configurations.
  • Workspaces are technically equivalent to renaming your state file.
  • using a remote backend instead is recommended when there are multiple collaborators.
  •  
    "The persistent data stored in the backend belongs to a workspace."
張 旭

architecture - Difference between a "coroutine" and a "thread"? - Stack Overflow - 0 views

  • Co stands for cooperation. A co routine is asked to (or better expected to) willingly suspend its execution to give other co-routines a chance to execute too. So a co-routine is about sharing CPU resources (willingly) so others can use the same resource as oneself is using.
  • A thread on the other hand does not need to suspend its execution. Being suspended is completely transparent to the thread and the thread is forced by underlying hardware to suspend itself.
  • co-routines can not be concurrently executed and race conditions can not occur.
  • ...8 more annotations...
  • Concurrency is the separation of tasks to provide interleaved execution.
  • Parallelism is the simultaneous execution of multiple pieces of work in order to increase speed.
  • With threads, the operating system switches running threads preemptively according to its scheduler, which is an algorithm in the operating system kernel.
  • With coroutines, the programmer and programming language determine when to switch coroutines
  • In contrast to threads, which are pre-emptively scheduled by the operating system, coroutine switches are cooperative, meaning the programmer (and possibly the programming language and its runtime) controls when a switch will happen.
  • preemption
  • Coroutines are a form of sequential processing: only one is executing at any given time
  • Threads are (at least conceptually) a form of concurrent processing: multiple threads may be executing at any given time.
  •  
    "Co stands for cooperation. A co routine is asked to (or better expected to) willingly suspend its execution to give other co-routines a chance to execute too. So a co-routine is about sharing CPU resources (willingly) so others can use the same resource as oneself is using."
1 - 16 of 16
Showing 20 items per page