Skip to main content

Home/ Larvata/ Group items tagged sql

Rss Feed Group items tagged

張 旭

Active Record Migrations - Ruby on Rails Guides - 0 views

  • a convenient way to alter your database schema
  • each migration as being a new 'version' of the database
  • On databases that support transactions with statements that change the schema, migrations are wrapped in a transaction
  • ...14 more annotations...
  • a UTC timestamp identifying the migration
  • references(also available as belongs_to)
  • produce join tables if JoinTable is part of the name
  • The model and scaffold generators will create migrations appropriate for adding a new model
  • The create_table method is one of the most fundamental
  • By default, create_table will create a primary key called id
  • the default is ENGINE=InnoDB
  • Migration method create_join_table creates a HABTM join table.
  • By default, create_join_table will create two columns with no options
  • change_table, used for changing existing tables
  • execute method to execute arbitrary SQL
  • The change method is the primary way of writing migrations
  • migration definitions
  • write the up and down methods instead of using the change method
張 旭

Scalable architecture without magic (and how to build it if you're not Google) - DEV Co... - 0 views

  • Don’t mess up write-first and read-first databases.
  • keep them stateless.
  • you should know how to make a scalable setup on bare metal.
  • ...29 more annotations...
  • Different programming languages are for different tasks.
  • Go or C which are compiled to run on bare metal.
  • To run NodeJS on multiple cores, you have to use something like PM2, but since this you have to keep your code stateless.
  • Python have very rich and sugary syntax that’s great for working with data while keeping your code small and expressive.
  • SQL is almost always slower than NoSQL
  • databases are often read-first or write-first
  • write-first, just like Cassandra.
  • store all of your data to your databases and leave nothing at backend
  • Functional code is stateless by default
  • It’s better to go for stateless right from the very beginning.
  • deliver exactly the same responses for same requests.
  • Sessions? Store them at Redis and allow all of your servers to access it.
  • Only the first user will trigger a data query, and all others will be receiving exactly the same data straight from the RAM
  • never, never cache user input
  • Only the server output should be cached
  • Varnish is a great cache option that works with HTTP responses, so it may work with any backend.
  • a rate limiter – if there’s not enough time have passed since last request, the ongoing request will be denied.
  • different requests blasting every 10ms can bring your server down
  • Just set up entry relations and allow your database to calculate external keys for you
  • the query planner will always be faster than your backend.
  • Backend should have different responsibilities: hashing, building web pages from data and templates, managing sessions and so on.
  • For anything related to data management or data models, move it to your database as procedures or queries.
  • a distributed database.
  • your code has to be stateless
  • Move anything related to the data to the database.
  • For load-balancing a database, go for cluster.
  • DB is balancing requests, as well as your backend.
  • Users from different continents are separated with DNS.
  • Keep is scalable, keep is stateless.
  •  
    "Don't mess up write-first and read-first databases."
張 旭

我必须得告诉大家的MySQL优化原理 - 简书 - 0 views

  • 很多的查询优化工作实际上就是遵循一些原则让MySQL的优化器能够按照预想的合理方式运行而已
  • MySQL客户端/服务端通信协议是“半双工”的:在任一时刻,要么是服务器向客户端发送数据,要么是客户端向服务器发送数据,这两个动作不能同时发生。
  • 当查询语句很长的时候,需要设置max_allowed_packet参数。
  • ...70 more annotations...
  • 如果查询实在是太大,服务端会拒绝接收更多数据并抛出异常
  • 服务器响应给用户的数据通常会很多,由多个数据包组成
  • 减小通信间数据包的大小和数量是一个非常好的习惯
  • 查询中尽量避免使用SELECT *以及加上LIMIT限制
  • 在解析一个查询语句前,如果查询缓存是打开的,那么MySQL会检查这个查询语句是否命中查询缓存中的数据。
  • 两个查询在任何字符上的不同(例如:空格、注释),都会导致缓存不会命中。
  • MySQL将缓存存放在一个引用表
  • 如果查询中包含任何用户自定义函数、存储函数、用户变量、临时表、mysql库中的系统表,其查询结果 都不会被缓存。
  • MySQL的查询缓存系统会跟踪查询中涉及的每个表,如果这些表(数据或结构)发生变化,那么和这张表相关的所有缓存数据都将失效。
  • 在任何的写操作时,MySQL必须将对应表的所有缓存都设置为失效。
  • 查询缓存对系统的额外消耗也不仅仅在写操作,读操作也不例外
  • 任何的查询语句在开始之前都必须经过检查,即使这条SQL语句永远不会命中缓存
  • 如果查询结果可以被缓存,那么执行完成后,会将结果存入缓存,也会带来额外的系统消耗
  • 并不是什么情况下查询缓存都会提高系统性能,缓存和失效都会带来额外消耗
  • 用多个小表代替一个大表
  • 批量插入代替循环单条插入
  • 合理控制缓存空间大小
  • 可以通过SQL_CACHE和SQL_NO_CACHE来控制某个查询语句是否需要进行缓存
  • 不要轻易打开查询缓存,特别是写密集型应用
  • 将query_cache_type设置为DEMAND,这时只有加入SQL_CACHE的查询才会走缓存,其他查询则不会,这样可以非常自由地控制哪些查询需要被缓存。
  • 预处理则会根据MySQL规则进一步检查解析树是否合法。比如检查要查询的数据表和数据列是否存在等等。
  • 一条查询可以有很多种执行方式,最后都返回相应的结果。优化器的作用就是找到这其中最好的执行计划。
  • 查询当前会话的last_query_cost的值来得到其计算当前查询的成本。
  • 尝试预测一个查询使用某种执行计划时的成本
  • 有非常多的原因会导致MySQL选择错误的执行计划
  • MySQL值选择它认为成本小的,但成本小并不意味着执行时间短
  • 提前终止查询(比如:使用Limit时,查找到满足数量的结果集后会立即终止查询)
  • 找某列的最小值,如果该列有索引,只需要查找B+Tree索引最左端,反之则可以找到最大值
  • 在完成解析和优化阶段以后,MySQL会生成对应的执行计划,查询执行引擎根据执行计划给出的指令逐步执行得出结果。
  • 查询过程中的每一张表由一个handler实例表示。
  • MySQL在查询优化阶段就为每一张表创建了一个handler实例,优化器可以根据这些实例的接口来获取表的相关信息
  • 即使查询不到数据,MySQL仍然会返回这个查询的相关信息,比如改查询影响到的行数以及执行时间等等。
  • 结果集返回客户端是一个增量且逐步返回的过程
  • 整型就比字符操作代价低,因而会使用整型来存储ip地址,使用DATETIME来存储时间,而不是使用字符串。
  • 如果计划在列上创建索引,就应该将该列设置为NOT NULL
  • UNSIGNED表示不允许负值,大致可以使正数的上限提高一倍。
  • 没有太大的必要使用DECIMAL数据类型。即使是在需要存储财务数据时,仍然可以使用BIGINT
  • 大表ALTER TABLE非常耗时
  • MySQL执行大部分修改表结果操作的方法是用新的结构创建一个张空表,从旧表中查出所有的数据插入新表,然后再删除旧表
  • 索引是提高MySQL查询性能的一个重要途径,但过多的索引可能会导致过高的磁盘使用率以及过高的内存占用,从而影响应用程序的整体性能。
  • 索引是指B-Tree索引,它是目前关系型数据库中查找数据最为常用和有效的索引,大多数存储引擎都支持这种索引
  • B+Tree中的B是指balance,意为平衡。
  • 平衡二叉树首先需要符合二叉查找树的定义,其次必须满足任何节点的两个子树的高度差不能大于1。
  • 索引往往以索引文件的形式存储的磁盘上
  • 如何减少查找过程中的I/O存取次数?
  • 减少树的深度,将二叉树变为m叉树(多路搜索树),而B+Tree就是一种多路搜索树。
  • 页是计算机管理存储器的逻辑块,硬件及OS往往将主存和磁盘存储区分割为连续的大小相等的块
  • B+Tree为了保持平衡,对于新插入的值需要做大量的拆分页操作,而页的拆分需要I/O操作,为了尽可能的减少页的拆分操作,B+Tree也提供了类似于平衡二叉树的旋转功能。
  • 在多数情况下,在多个列上建立独立的索引并不能提高查询性能。理由非常简单,MySQL不知道选择哪个索引的查询效率更好
  • 当出现多个索引做联合操作时(多个OR条件),对结果集的合并、排序等操作需要耗费大量的CPU和内存资源,特别是当其中的某些索引的选择性不高,需要返回合并大量数据时,查询成本更高。
  • explain时如果发现有索引合并(Extra字段出现Using union),应该好好检查一下查询和表结构是不是已经是最优的
  • 索引选择性是指不重复的索引值和数据表的总记录数的比值
  • 唯一索引的选择性是1,这时最好的索引选择性,性能也是最好的。
  • 如果一个索引包含或者说覆盖所有需要查询的字段的值,那么就没有必要再回表查询,这就称为覆盖索引。
  • 对结果集进行排序的操作
  • 按照索引顺序扫描得出的结果自然是有序的
  • 如果explain的结果中type列的值为index表示使用了索引扫描来做排序。
  • 在设计索引时,如果一个索引既能够满足排序,有满足查询,是最好的。
  • 比如有一个索引(A,B),再创建索引(A)就是冗余索引。
  • 对于非常小的表,简单的全表扫描更高效。对于中到大型的表,索引就非常有效。对于超大型的表,建立和维护索引的代价随之增长,这时候其他技术也许更有效,比如分区表。
  • explain后再提测是一种美德。
  • 如果要统计行数,直接使用COUNT(*),意义清晰,且性能更好。
  • 执行EXPLAIN并不需要真正地去执行查询,所以成本非常低。
  • 确保ON和USING字句中的列上有索引。在创建索引的时候就要考虑到关联的顺序
  • 确保任何的GROUP BY和ORDER BY中的表达式只涉及到一个表中的列,这样MySQL才有可能使用索引来优化
  • MySQL关联执行的策略非常简单,它对任何的关联都执行嵌套循环关联操作,即先在一个表中循环取出单条数据,然后在嵌套循环到下一个表中寻找匹配的行,依次下去,直到找到所有表中匹配的行为为止。然后根据各个表匹配的行,返回查询中需要的各个列。
  • 最外层的查询是根据A.xx列来查询的,A.c上如果有索引的话,整个关联查询也不会使用。再看内层的查询,很明显B.c上如果有索引的话,能够加速查询,因此只需要在关联顺序中的第二张表的相应列上创建索引即可。
  • MySQL处理UNION的策略是先创建临时表,然后再把各个查询结果插入到临时表中,最后再来做查询。
  • 要使用UNION ALL,如果没有ALL关键字,MySQL会给临时表加上DISTINCT选项,这会导致整个临时表的数据做唯一性检查,这样做的代价非常高。
  • 尽可能不要使用存储过程
張 旭

MySQL on Docker: Running ProxySQL as Kubernetes Service | Severalnines - 0 views

  • Using Kubernetes ConfigMap approach, ProxySQL can be clustered with immutable configuration.
  • Kubernetes handles ProxySQL recovery and balance the connections to the instances automatically.
  • Can be used with external applications outside Kubernetes.
  • ...11 more annotations...
  • load balancing, connection failover and decoupling of the application tier from the underlying database topologies.
  • ProxySQL as a Kubernetes service (centralized deployment)
  • running as a service makes ProxySQL pods live independently from the applications and can be easily scaled and clustered together with the help of Kubernetes ConfigMap.
  • ProxySQL's multi-layer configuration system makes pod clustering possible with ConfigMap.
  • create ProxySQL pods and attach a Kubernetes service to be accessed by the other pods within the Kubernetes network or externally.
  • Default to 6033 for MySQL load-balanced connections and 6032 for ProxySQL administration console.
  • separated by "---" delimiter
  • deploy two ProxySQL pods as a ReplicaSet that matches containers labelled with "app=proxysql,tier=frontend".
  • A Kubernetes service is an abstraction layer which defines the logical set of pods and a policy by which to access them
  • The range of valid ports for NodePort resource is 30000-32767.
  • ConfigMap - To store ProxySQL configuration file as a volume so it can be mounted to multiple pods and can be remounted again if the pod is being rescheduled to the other Kubernetes node.
張 旭

Kubernetes Deployments: The Ultimate Guide - Semaphore - 1 views

  • Continuous integration gives you confidence in your code. To extend that confidence to the release process, your deployment operations need to come with a safety belt.
  • these Kubernetes objects ensure that you can progressively deploy, roll back and scale your applications without downtime.
  • A pod is just a group of containers (it can be a group of one container) that run on the same machine, and share a few things together.
  • ...34 more annotations...
  • the containers within a pod can communicate with each other over localhost
  • From a network perspective, all the processes in these containers are local.
  • we can never create a standalone container: the closest we can do is create a pod, with a single container in it.
  • Kubernetes is a declarative system (by opposition to imperative systems).
  • All we can do, is describe what we want to have, and wait for Kubernetes to take action to reconcile what we have, with what we want to have.
  • In other words, we can say, “I would like a 40-feet long blue container with yellow doors“, and Kubernetes will find such a container for us. If it doesn’t exist, it will build it; if there is already one but it’s green with red doors, it will paint it for us; if there is already a container of the right size and color, Kubernetes will do nothing, since what we have already matches what we want.
  • The specification of a replica set looks very much like the specification of a pod, except that it carries a number, indicating how many replicas
  • What happens if we change that definition? Suddenly, there are zero pods matching the new specification.
  • the creation of new pods could happen in a more gradual manner.
  • the specification for a deployment looks very much like the one for a replica set: it features a pod specification, and a number of replicas.
  • Deployments, however, don’t create or delete pods directly.
  • When we update a deployment and adjust the number of replicas, it passes that update down to the replica set.
  • When we update the pod specification, the deployment creates a new replica set with the updated pod specification. That replica set has an initial size of zero. Then, the size of that replica set is progressively increased, while decreasing the size of the other replica set.
  • we are going to fade in (turn up the volume) on the new replica set, while we fade out (turn down the volume) on the old one.
  • During the whole process, requests are sent to pods of both the old and new replica sets, without any downtime for our users.
  • A readiness probe is a test that we add to a container specification.
  • Kubernetes supports three ways of implementing readiness probes:Running a command inside a container;Making an HTTP(S) request against a container; orOpening a TCP socket against a container.
  • When we roll out a new version, Kubernetes will wait for the new pod to mark itself as “ready” before moving on to the next one.
  • If there is no readiness probe, then the container is considered as ready, as long as it could be started.
  • MaxSurge indicates how many extra pods we are willing to run during a rolling update, while MaxUnavailable indicates how many pods we can lose during the rolling update.
  • Setting MaxUnavailable to 0 means, “do not shutdown any old pod before a new one is up and ready to serve traffic“.
  • Setting MaxSurge to 100% means, “immediately start all the new pods“, implying that we have enough spare capacity on our cluster, and that we want to go as fast as possible.
  • kubectl rollout undo deployment web
  • the replica set doesn’t look at the pods’ specifications, but only at their labels.
  • A replica set contains a selector, which is a logical expression that “selects” (just like a SELECT query in SQL) a number of pods.
  • it is absolutely possible to manually create pods with these labels, but running a different image (or with different settings), and fool our replica set.
  • Selectors are also used by services, which act as the load balancers for Kubernetes traffic, internal and external.
  • internal IP address (denoted by the name ClusterIP)
  • during a rollout, the deployment doesn’t reconfigure or inform the load balancer that pods are started and stopped. It happens automatically through the selector of the service associated to the load balancer.
  • a pod is added as a valid endpoint for a service only if all its containers pass their readiness check. In other words, a pod starts receiving traffic only once it’s actually ready for it.
  • In blue/green deployment, we want to instantly switch over all the traffic from the old version to the new, instead of doing it progressively
  • We can achieve blue/green deployment by creating multiple deployments (in the Kubernetes sense), and then switching from one to another by changing the selector of our service
  • kubectl label pods -l app=blue,version=v1.5 status=enabled
  • kubectl label pods -l app=blue,version=v1.4 status-
  •  
    "Continuous integration gives you confidence in your code. To extend that confidence to the release process, your deployment operations need to come with a safety belt."
‹ Previous 21 - 28 of 28
Showing 20 items per page