0byt3m1n1

Path: /home/kassiope/OLD/00-OLD/cache/ [ Home ]
File: 99140605be757ded28f833ec2ba958a6.spc
a:4:{s:5:"child";a:1:{s:0:"";a:1:{s:3:"rss";a:1:{i:0;a:6:{s:4:"data";s:3:"


";s:7:"attribs";a:1:{s:0:"";a:1:{s:7:"version";s:3:"2.0";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:1:{s:0:"";a:1:{s:7:"channel";a:1:{i:0;a:6:{s:4:"data";s:217:"
  
  
  
  
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:1:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:12:"Planet MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:27:"http://www.planetmysql.org/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 30 Mar 2010 13:45:02 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"language";a:1:{i:0;a:5:{s:4:"data";s:2:"en";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:42:"Planet MySQL - http://www.planetmysql.org/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"item";a:50:{i:0;a:6:{s:4:"data";s:63:"
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:72:"Using ClusterJ (part of MySQL Cluster Connector for Java) – a tutorial";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:32:"http://www.clusterdb.com/?p=1008";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:106:"http://www.clusterdb.com/mysql-cluster/using-clusterj-part-of-mysql-cluster-connector-for-java-a-tutorial/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:16573:"Fig. 1 Java access to MySQL Cluster
ClusterJ is part of the MySQL Cluster Connector for Java which is currently in beta as part of MySQL Cluster 7.1. It is designed to provide a high performance method for Java applications to store and access data in a MySQL Cluster database. It is also designed to be easy for Java developers to use and is “in the style of” Hibernate/Java Data Objects (JDO) and JPA. It uses the Domain Object Model DataMapper pattern:

Data is represented as domain objects
Domain objects are separate from business logic
Domain objects are mapped to database tables

The purpose of ClusterJ is to provide a mapping from the table-oriented view of the data stored in MySQL Cluster to the Java objects used by the application. This is achieved by annotating interfaces representing the Java objects; where each persistent interface is mapped to a table and each property in that interface to a column. By default, the table name will match the interface name and the column names match the property names but this can be overridden using annotations.
Fig. 2 ClusterJ Interface Annotations
If the table does not already exist (for example, this is a brand new application with new data) then the table must be created manually &#8211; unlike OpenJPA, ClusterJ will not create the table automatically.
Figure 2 shows an example of an interface that has been created in order to represent the data held in the ‘employee’ table.
ClusterJ uses the following concepts:


Fig. 3 ClusterJ Terminology
SessionFactory: There is one instance per MySQL Cluster instance for each Java Virtual Machine (JVM). The SessionFactory object is used by the application to get hold of sessions. The configuration details for the ClusterJ instance are defined in the Configuration properties which is an artifact associated with the SessionFactory.
Session: There is one instance per user (per Cluster, per JVM) and represents a Cluster connection
Domain Object: Objects representing the data from a table. The domain objects (and their relationships to the Cluster tables) are defined by annotated interfaces (as shown in the right-hand side of Figure 2.
Transaction: There is one transaction per session at any point in time. By default, each operation (query, insert, update, or delete) is run under a new transaction. . The Transaction interface allows developers to aggregate multiple operations into a single, atomic unit of work.

ClusterJ will be suitable for many Java developers but it has some restrictions which may make OpenJPA with the ClusterJPA plug-in more appropriate. These ClusterJ restrictions are:

Persistent Interfaces rather than persistent classes. The developer provides the signatures for the getter/setter methods rather than the properties and no extra methods can be added.
No Relationships between properties or between objects can be defined in the domain objects. Properties are primitive types.
No Multi-table inheritance; there is a single table per persistent interface
No joins in queries (all data being queried must be in the same table/interface)
No Table creation &#8211; user needs to create tables and indexes
No Lazy Loading &#8211; entire record is loaded at one time, including large object (LOBs).

Tutorial
This tutorial uses MySQL Cluster 7.1.2a on Fedora 12. If using earlier or more recent versions of MySQL Cluster then you may need to change the class-paths as explained in http://dev.mysql.com/doc/ndbapi/en/mccj-using-clusterj.html
It is necessary to have MySQL Cluster up and running. For simplicity all of the nodes (processes) making up the Cluster will be run on the same physical host, along with the application.
These are the MySQL Cluster configuration files being used :
config.ini:
[ndbd default]noofreplicas=2
datadir=/home/billy/mysql/my_cluster/data

[ndbd]
hostname=localhost
id=3

[ndbd]
hostname=localhost
id=4

[ndb_mgmd]
id = 1
hostname=localhost
datadir=/home/billy/mysql/my_cluster/data

[mysqld]
hostname=localhost
id=101

[api]
hostname=localhost
my.cnf:
[mysqld]
ndbcluster
datadir=/home/billy/mysql/my_cluster/data
basedir=/usr/local/mysql
This tutorial focuses on ClusterJ rather than on running MySQL Cluster; if you are new to MySQL Cluster then refer to running a simple Cluster before trying this tutorial.
ClusterJ needs to be told how to connect to our MySQL Cluster database; including the connect string (the address/port for the management node), the database to use, the user to login as and attributes for the connection such as the timeout values. If these parameters aren’t defined then ClusterJ will fail with run-time exceptions. This information represents the “configuration properties” shown in Figure 3.  These parameters can be hard coded in the application code but it is more maintainable to create a clusterj.properties file that will be imported by the application. This file should be stored in the same directory as your application source code.
clusterj.properties:
com.mysql.clusterj.connectstring=localhost:1186
 com.mysql.clusterj.database=clusterdb
 com.mysql.clusterj.connect.retries=4
 com.mysql.clusterj.connect.delay=5
 com.mysql.clusterj.connect.verbose=1
 com.mysql.clusterj.connect.timeout.before=30
 com.mysql.clusterj.connect.timeout.after=20
 com.mysql.clusterj.max.transactions=1024
 com.mysql.clusterj.username=
 com.mysql.clusterj.password=
As ClusterJ will not create tables automatically, the next step is to create ‘clusterdb’ database (referred to in clusterj.properties) and the ‘employee’ table:
[billy@ws1 ~]$ mysql -u root -h 127.0.0.1 -P 3306 -u root
 mysql&gt;  create database clusterdb;use clusterdb;
 mysql&gt; CREATE TABLE employee (
 -&gt;     id INT NOT NULL PRIMARY KEY,
 -&gt;     first VARCHAR(64) DEFAULT NULL,
 -&gt;     last VARCHAR(64) DEFAULT NULL,
 -&gt;     municipality VARCHAR(64) DEFAULT NULL,
 -&gt;     started VARCHAR(64) DEFAULT NULL,
 -&gt;     ended  VARCHAR(64) DEFAULT NULL,
 -&gt;     department INT NOT NULL DEFAULT 1,
 -&gt;     UNIQUE KEY idx_u_hash (first,last) USING HASH,
 -&gt;     KEY idx_municipality (municipality)
 -&gt; ) ENGINE=NDBCLUSTER;
The next step is to create the annotated interface:
Employee.java:
import com.mysql.clusterj.annotation.Column;
import com.mysql.clusterj.annotation.Index;
import com.mysql.clusterj.annotation.PersistenceCapable;
import com.mysql.clusterj.annotation.PrimaryKey;
@PersistenceCapable(table="employee")
@Index(name="idx_uhash")
public interface Employee {
@PrimaryKey
int getId();
void setId(int id);
String getFirst();
void setFirst(String first);

String getLast();
void setLast(String last);
@Column(name="municipality")
@Index(name="idx_municipality")
String getCity();
void setCity(String city);
String getStarted();
void setStarted(String date);
String getEnded();
void setEnded(String date);
Integer getDepartment();
void setDepartment(Integer department);
}
The name of the table is specified in the annotation @PersistenceCapable(table=&#8221;employee&#8221;) and then each column from the employee table has an associated getter and setter method defined in the interface. By default, the property name in the interface is the same as the column name in the table – the column name has been overridden for the City property by explicitly including the @Column(name=&#8221;municipality&#8221;) annotation just before the associated getter method. The @PrimaryKey annotation is used to identify the property whose associated column is the Primary Key in the table. ClusterJ is made aware of the existence of indexes in the database using the @Index annotation.
The next step is to write the application code which we step through here block by block; the first of which simply contains the import statements and then loads the contents of the clusterj.properties defined above (Note – refer to section 3.2.1 for details on compiling and running the tutorial code):
Main.java (part 1):
import com.mysql.clusterj.ClusterJHelper;
import com.mysql.clusterj.SessionFactory;
import com.mysql.clusterj.Session;
import com.mysql.clusterj.Query;
import com.mysql.clusterj.query.QueryBuilder;
import com.mysql.clusterj.query.QueryDomainType;
import java.io.File;
import java.io.InputStream;
import java.io.FileInputStream;
import java.io.*;
import java.util.Properties;
import java.util.List;
public class Main {
public static void main (String[] args) throws java.io.FileNotFoundException,java.io.IOException {
// Load the properties from the clusterj.properties file
File propsFile = new File("clusterj.properties");
InputStream inStream = new FileInputStream(propsFile);
Properties props = new Properties();
props.load(inStream);
//Used later to get userinput
BufferedReader br = new BufferedReader(new
InputStreamReader(System.in));
The next step is to get a handle for a SessionFactory from the ClusterJHelper class and then use that factory to create a session (based on the properties imported from clusterj.properties file.
Main.java (part 2):
// Create a session (connection to the database)
SessionFactory factory = ClusterJHelper.getSessionFactory(props);
Session session = factory.getSession();
Now that we have a session, it is possible to instantiate new Employee objects and then persist them to the database. Where there are no transaction begin() or commit() statements, each operation involving the database is treated as a separate transaction.
Main.java (part 3):
// Create and initialise an Employee
Employee newEmployee = session.newInstance(Employee.class);
newEmployee.setId(988);
newEmployee.setFirst("John");
newEmployee.setLast("Jones");
newEmployee.setStarted("1 February 2009");
newEmployee.setDepartment(666);
// Write the Employee to the database
session.persist(newEmployee);
At this point, a row will have been added to the ‘employee’ table. To verify this, a new Employee object is created and used to read the data back from the ‘employee’ table using the primary key (Id) value of 998:
Main.java (part 4):
// Fetch the Employee from the database
 Employee theEmployee = session.find(Employee.class, 988);
if (theEmployee == null)
 {System.out.println("Could not find employee");}
else
 {System.out.println ("ID: " + theEmployee.getId() + "; Name: " +
 theEmployee.getFirst() + " " + theEmployee.getLast());
 System.out.println ("Location: " + theEmployee.getCity());
 System.out.println ("Department: " + theEmployee.getDepartment());
 System.out.println ("Started: " + theEmployee.getStarted());
 System.out.println ("Left: " + theEmployee.getEnded());
}
This is the output seen at this point:
ID: 988; Name: John Jones
Location: null
Department: 666
Started: 1 February 2009
Left: null
Check the database before I change the Employee - hit return when you are done
The next step is to modify this data but it does not write it back to the database yet:
Main.java (part 5):
// Make some changes to the Employee &amp; write back to the database
theEmployee.setDepartment(777);
theEmployee.setCity("London");
System.out.println("Check the database before I change the Employee -
hit return when you are done");
String ignore = br.readLine();
The application will pause at this point and give you chance to check the database to confirm that the original data has been added as a new row but the changes have not been written back yet:
mysql&gt; select * from clusterdb.employee;
+-----+-------+-------+--------------+-----------------+-------+------------+
| id  | first | last  | municipality | started         | ended | department |
+-----+-------+-------+--------------+-----------------+-------+------------+
| 988 | John  | Jones | NULL         | 1 February 2009 | NULL  |        666 |
+-----+-------+-------+--------------+-----------------+-------+------------+
After hitting return, the application will continue and write the changes to the table, using an automatic transaction to perform the update.
Main.java (part 6):
session.updatePersistent(theEmployee);
System.out.println("Check the change in the table before I bulk add
Employees - hit return when you are done");
ignore = br.readLine();
The application will again pause so that we can now check that the change has been written back (persisted) to the database:
mysql&gt; select * from clusterdb.employee;
+-----+-------+-------+--------------+-----------------+-------+------------+
| id  | first | last  | municipality | started         | ended | department |
+-----+-------+-------+--------------+-----------------+-------+------------+
| 988 | John  | Jones | London       | 1 February 2009 | NULL  |        777 |
+-----+-------+-------+--------------+-----------------+-------+------------+
The application then goes onto create and persist 100 new employees. To improve performance, a single transaction is used to that all of the changes can be written to the database at once when the commit() statement is run:
Main.java (part 7):
// Add 100 new Employees - all as part of a single transaction
 newEmployee.setFirst("Billy");
 newEmployee.setStarted("28 February 2009");
session.currentTransaction().begin();
for (int i=700;i&lt;800;i++) {
 newEmployee.setLast("No-Mates"+i);
 newEmployee.setId(i+1000);
 newEmployee.setDepartment(i);
 session.persist(newEmployee);
 }
session.currentTransaction().commit();
The 100 new employees will now have been persisted to the database. The next step is to create and execute a query that will search the database for all employees in department 777 by using a QueryBuilder and using that to build a QueryDomain that compares the ‘department’ column with a parameter. After creating the, the department parameter is set to 777 (the query could subsequently be reused with different department numbers). The application then runs the query and iterates through and displays each of employees in the result set:
Main.java (part 8):
// Retrieve the set all of Employees in department 777
QueryBuilder builder = session.getQueryBuilder();
QueryDomainType&lt;Employee&gt; domain =
builder.createQueryDefinition(Employee.class);
domain.where(domain.get("department").equal(domain.param(
"department")));
Query&lt;Employee&gt; query = session.createQuery(domain);
query.setParameter("department",777);
List&lt;Employee&gt; results = query.getResultList();
for (Employee deptEmployee: results) {
System.out.println ("ID: " + deptEmployee.getId() + "; Name: " +
deptEmployee.getFirst() + " " + deptEmployee.getLast());
System.out.println ("Location: " + deptEmployee.getCity());
System.out.println ("Department: " + deptEmployee.getDepartment());
System.out.println ("Started: " + deptEmployee.getStarted());
System.out.println ("Left: " + deptEmployee.getEnded());
}
System.out.println("Last chance to check database before emptying table
- hit return when you are done");
ignore = br.readLine();
At this point, the application will display the following and prompt the user to allow it to continue:
ID: 988; Name: John Jones
Location: London
Department: 777
Started: 1 February 2009
Left: null
ID: 1777; Name: Billy No-Mates777
Location: null
Department: 777
Started: 28 February 2009
Left: null
We can compare that output with an SQL query performed on the database:
mysql&gt; select * from employee where department=777;
 +------+-------+-------------+--------------+------------------+-------+------------+
 | id   | first | last        | municipality | started          | ended | department |
 +------+-------+-------------+--------------+------------------+-------+------------+
 |  988 | John  | Jones       | London       | 1 February 2009  | NULL  |        777 |
 | 1777 | Billy | No-Mates777 | NULL         | 28 February 2009 | NULL  |        777 |
 +------+-------+-------------+--------------+------------------+-------+------------+
Finally, after pressing return again, the application will remove all employees:
Main.java (part 9):
session.deletePersistentAll(Employee.class);
 }
}
As a final check, an SQL query confirms that all of the rows have been deleted from the ‘employee’ table.
mysql&gt; select * from employee;
Empty set (0.00 sec)
Compiling and running the ClusterJ tutorial code
javac -classpath /usr/local/mysql/share/mysql/java/clusterj-api.jar:. Main.java Employee.java
java -classpath /usr/local/mysql/share/mysql/java/clusterj.jar:. -Djava.library.path=/usr/local/mysql/lib Main
 
Download the source code for this tutorial from here (together with the code for the up-coming ClusterJPA tutorial).";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 30 Mar 2010 12:40:27 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:13:"MySQL Cluster";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:4:"Java";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:3:"JPA";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:17:"MySQL Cluster 7.1";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:7:"NDB API";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:19816:"<div><a href="http://www.clusterdb.com/wp-content/uploads/2010/03/ClusterJ_Stack.jpg"><img class="size-medium wp-image-1010" title="Java access to MySQL Cluster" src="http://www.clusterdb.com/wp-content/uploads/2010/03/ClusterJ_Stack-224x300.jpg" alt="" width="224" height="300" /></a><p>Fig. 1 Java access to MySQL Cluster</p></div>
<p>ClusterJ is part of the MySQL Cluster Connector for Java which is currently in beta as part of MySQL Cluster 7.1. It is designed to provide a high performance method for Java applications to store and access data in a MySQL Cluster database. It is also designed to be easy for Java developers to use and is “in the style of” Hibernate/Java Data Objects (JDO) and JPA. It uses the Domain Object Model DataMapper pattern:</p>
<ul>
<li>Data is represented as domain objects</li>
<li>Domain objects are separate from business logic</li>
<li>Domain objects are mapped to database tables</li>
</ul>
<p>The purpose of ClusterJ is to provide a mapping from the table-oriented view of the data stored in MySQL Cluster to the Java objects used by the application. This is achieved by annotating interfaces representing the Java objects; where each persistent interface is mapped to a table and each property in that interface to a column. By default, the table name will match the interface name and the column names match the property names but this can be overridden using annotations.</p>
<div><a href="http://www.clusterdb.com/wp-content/uploads/2010/03/ClusterJ_Annotations.jpg"><img class="size-medium wp-image-1011" title="ClusterJ_Annotations" src="http://www.clusterdb.com/wp-content/uploads/2010/03/ClusterJ_Annotations-300x159.jpg" alt="" width="300" height="159" /></a><p>Fig. 2 ClusterJ Interface Annotations</p></div>
<p>If the table does not already exist (for example, this is a brand new application with new data) then the table must be created manually &#8211; unlike OpenJPA, ClusterJ will not create the table automatically.</p>
<p>Figure 2 shows an example of an interface that has been created in order to represent the data held in the ‘employee’ table.</p>
<p>ClusterJ uses the following concepts:</p>
<ul>
<li>
<div><a href="http://www.clusterdb.com/wp-content/uploads/2010/03/ClusterJ_concepts.jpg"><img class="size-medium wp-image-1013" title="ClusterJ_concepts" src="http://www.clusterdb.com/wp-content/uploads/2010/03/ClusterJ_concepts-300x300.jpg" alt="" width="300" height="300" /></a><p>Fig. 3 ClusterJ Terminology</p></div>
<p><strong>SessionFactory</strong>: There is one instance per MySQL Cluster instance for each Java Virtual Machine (JVM). The SessionFactory object is used by the application to get hold of sessions. The configuration details for the ClusterJ instance are defined in the Configuration properties which is an artifact associated with the SessionFactory.</li>
<li><strong>Session</strong>: There is one instance per user (per Cluster, per JVM) and represents a Cluster connection</li>
<li><strong>Domain Object</strong>: Objects representing the data from a table. The domain objects (and their relationships to the Cluster tables) are defined by annotated interfaces (as shown in the right-hand side of Figure 2.</li>
<li><strong>Transaction</strong>: There is one transaction per session at any point in time. By default, each operation (query, insert, update, or delete) is run under a new transaction. . The Transaction interface allows developers to aggregate multiple operations into a single, atomic unit of work.</li>
</ul>
<p>ClusterJ will be suitable for many Java developers but it has some restrictions which may make OpenJPA with the ClusterJPA plug-in more appropriate. These ClusterJ restrictions are:</p>
<ul>
<li>Persistent Interfaces rather than persistent classes. The developer provides the signatures for the getter/setter methods rather than the properties and no extra methods can be added.</li>
<li>No Relationships between properties or between objects can be defined in the domain objects. Properties are primitive types.</li>
<li>No Multi-table inheritance; there is a single table per persistent interface</li>
<li>No joins in queries (all data being queried must be in the same table/interface)</li>
<li>No Table creation &#8211; user needs to create tables and indexes</li>
<li>No Lazy Loading &#8211; entire record is loaded at one time, including large object (LOBs).</li>
</ul>
<h3>Tutorial</h3>
<p>This tutorial uses MySQL Cluster 7.1.2a on Fedora 12. If using earlier or more recent versions of MySQL Cluster then you may need to change the class-paths as explained in <a href="http://dev.mysql.com/doc/ndbapi/en/mccj-using-clusterj.html" target="_blank">http://dev.mysql.com/doc/ndbapi/en/mccj-using-clusterj.html</a></p>
<p>It is necessary to have MySQL Cluster up and running. For simplicity all of the nodes (processes) making up the Cluster will be run on the same physical host, along with the application.</p>
<p>These are the MySQL Cluster configuration files being used :</p>
<p><strong>config.ini:</strong></p>
<pre><span>[ndbd default]noofreplicas=2
datadir=/home/billy/mysql/my_cluster/data

[ndbd]
hostname=localhost
id=3

[ndbd]
hostname=localhost
id=4

[ndb_mgmd]
id = 1
hostname=localhost
datadir=/home/billy/mysql/my_cluster/data

[mysqld]
hostname=localhost
id=101

[api]
hostname=localhost</span></pre>
<p><strong>my.cnf:</strong></p>
<pre><span>[mysqld]
ndbcluster
datadir=/home/billy/mysql/my_cluster/data
basedir=/usr/local/mysql</span></pre>
<p>This tutorial focuses on ClusterJ rather than on running MySQL Cluster; if you are new to MySQL Cluster then refer to <a href="http://www.clusterdb.com/mysql-cluster/creating-a-simple-cluster-on-a-single-linux-host/" target="_blank">running a simple Cluster</a> before trying this tutorial.</p>
<p>ClusterJ needs to be told how to connect to our MySQL Cluster database; including the connect string (the address/port for the management node), the database to use, the user to login as and attributes for the connection such as the timeout values. If these parameters aren’t defined then ClusterJ will fail with run-time exceptions. This information represents the “configuration properties” shown in Figure 3.  These parameters can be hard coded in the application code but it is more maintainable to create a clusterj.properties file that will be imported by the application. This file should be stored in the same directory as your application source code.</p>
<p><strong>clusterj.properties:</strong></p>
<pre><span>com.mysql.clusterj.connectstring=localhost:1186
 com.mysql.clusterj.database=clusterdb
 com.mysql.clusterj.connect.retries=4
 com.mysql.clusterj.connect.delay=5
 com.mysql.clusterj.connect.verbose=1
 com.mysql.clusterj.connect.timeout.before=30
 com.mysql.clusterj.connect.timeout.after=20
 com.mysql.clusterj.max.transactions=1024
 com.mysql.clusterj.username=
 com.mysql.clusterj.password=</span></pre>
<p>As ClusterJ will not create tables automatically, the next step is to create ‘clusterdb’ database (referred to in clusterj.properties) and the ‘employee’ table:</p>
<pre><span>[billy@ws1 ~]$ mysql -u root -h 127.0.0.1 -P 3306 -u root
 mysql&gt;  create database clusterdb;use clusterdb;
 mysql&gt; CREATE TABLE employee (
 -&gt;     id INT NOT NULL PRIMARY KEY,
 -&gt;     first VARCHAR(64) DEFAULT NULL,
 -&gt;     last VARCHAR(64) DEFAULT NULL,
 -&gt;     municipality VARCHAR(64) DEFAULT NULL,
 -&gt;     started VARCHAR(64) DEFAULT NULL,
 -&gt;     ended  VARCHAR(64) DEFAULT NULL,
 -&gt;     department INT NOT NULL DEFAULT 1,
 -&gt;     UNIQUE KEY idx_u_hash (first,last) USING HASH,
 -&gt;     KEY idx_municipality (municipality)
 -&gt; ) ENGINE=NDBCLUSTER;</span></pre>
<p>The next step is to create the annotated interface:</p>
<p><strong>Employee.java:</strong></p>
<pre><span>import com.mysql.clusterj.annotation.Column;
import com.mysql.clusterj.annotation.Index;
import com.mysql.clusterj.annotation.PersistenceCapable;
import com.mysql.clusterj.annotation.PrimaryKey;</span></pre>
<pre><span>@PersistenceCapable(table="employee")
@Index(name="idx_uhash")
public interface Employee {</span></pre>
<pre><span>@PrimaryKey
int getId();
void setId(int id);</span></pre>
<pre><span>String getFirst();
void setFirst(String first);

String getLast();
void setLast(String last);</span></pre>
<pre><span>@Column(name="municipality")
@Index(name="idx_municipality")
String getCity();
void setCity(String city);</span></pre>
<pre><span>String getStarted();
void setStarted(String date);</span></pre>
<pre><span>String getEnded();
void setEnded(String date);</span></pre>
<pre><span>Integer getDepartment();
void setDepartment(Integer department);
}</span></pre>
<p>The name of the table is specified in the annotation @PersistenceCapable(table=&#8221;employee&#8221;) and then each column from the employee table has an associated getter and setter method defined in the interface. By default, the property name in the interface is the same as the column name in the table – the column name has been overridden for the City property by explicitly including the @Column(name=&#8221;municipality&#8221;) annotation just before the associated getter method. The @PrimaryKey annotation is used to identify the property whose associated column is the Primary Key in the table. ClusterJ is made aware of the existence of indexes in the database using the @Index annotation.</p>
<p>The next step is to write the application code which we step through here block by block; the first of which simply contains the import statements and then loads the contents of the clusterj.properties defined above (Note – refer to section 3.2.1 for details on compiling and running the tutorial code):</p>
<p><strong>Main.java (part 1):</strong></p>
<pre><span>import com.mysql.clusterj.ClusterJHelper;
import com.mysql.clusterj.SessionFactory;
import com.mysql.clusterj.Session;
import com.mysql.clusterj.Query;
import com.mysql.clusterj.query.QueryBuilder;
import com.mysql.clusterj.query.QueryDomainType;</span></pre>
<pre><span>import java.io.File;
import java.io.InputStream;
import java.io.FileInputStream;
import java.io.*;</span></pre>
<pre><span>import java.util.Properties;
import java.util.List;</span></pre>
<pre><span>public class Main {</span></pre>
<pre><span>public static void main (String[] args) throws java.io.FileNotFoundException,java.io.IOException {</span></pre>
<pre><span>// Load the properties from the clusterj.properties file</span></pre>
<pre><span>File propsFile = new File("clusterj.properties");
InputStream inStream = new FileInputStream(propsFile);
Properties props = new Properties();
props.load(inStream);</span></pre>
<pre><span>//Used later to get userinput
BufferedReader br = new BufferedReader(new
InputStreamReader(System.in));</span></pre>
<p>The next step is to get a handle for a SessionFactory from the ClusterJHelper class and then use that factory to create a session (based on the properties imported from clusterj.properties file.</p>
<p><strong>Main.java (part 2):</strong></p>
<pre><span>// Create a session (connection to the database)
SessionFactory factory = ClusterJHelper.getSessionFactory(props);
Session session = factory.getSession();</span></pre>
<p>Now that we have a session, it is possible to instantiate new Employee objects and then persist them to the database. Where there are no transaction begin() or commit() statements, each operation involving the database is treated as a separate transaction.</p>
<p><strong>Main.java (part 3):</strong></p>
<pre><span>// Create and initialise an Employee
Employee newEmployee = session.newInstance(Employee.class);
newEmployee.setId(988);
newEmployee.setFirst("John");
newEmployee.setLast("Jones");
newEmployee.setStarted("1 February 2009");
newEmployee.setDepartment(666);</span></pre>
<pre><span>// Write the Employee to the database
session.persist(newEmployee);</span></pre>
<p>At this point, a row will have been added to the ‘employee’ table. To verify this, a new Employee object is created and used to read the data back from the ‘employee’ table using the primary key (Id) value of 998:</p>
<p><strong>Main.java (part 4):</strong></p>
<pre><span>// Fetch the Employee from the database
 Employee theEmployee = session.find(Employee.class, 988);</span></pre>
<pre><span>if (theEmployee == null)
 {System.out.println("Could not find employee");}
else
 {System.out.println ("ID: " + theEmployee.getId() + "; Name: " +
 theEmployee.getFirst() + " " + theEmployee.getLast());
 System.out.println ("Location: " + theEmployee.getCity());
 System.out.println ("Department: " + theEmployee.getDepartment());
 System.out.println ("Started: " + theEmployee.getStarted());
 System.out.println ("Left: " + theEmployee.getEnded());
}</span></pre>
<p>This is the output seen at this point:</p>
<pre><span>ID: 988; Name: John Jones
Location: null
Department: 666
Started: 1 February 2009
Left: null
Check the database before I change the Employee - hit return when you are done</span></pre>
<p>The next step is to modify this data but it does not write it back to the database yet:</p>
<p><strong>Main.java (part 5):</strong></p>
<pre><span>// Make some changes to the Employee &amp; write back to the database
theEmployee.setDepartment(777);
theEmployee.setCity("London");</span></pre>
<pre><span>System.out.println("Check the database before I change the Employee -
hit return when you are done");
String ignore = br.readLine();</span></pre>
<p>The application will pause at this point and give you chance to check the database to confirm that the original data has been added as a new row but the changes have not been written back yet:</p>
<pre><span>mysql&gt; select * from clusterdb.employee;
+-----+-------+-------+--------------+-----------------+-------+------------+
| id  | first | last  | municipality | started         | ended | department |
+-----+-------+-------+--------------+-----------------+-------+------------+
| 988 | John  | Jones | NULL         | 1 February 2009 | NULL  |        666 |
+-----+-------+-------+--------------+-----------------+-------+------------+</span></pre>
<p>After hitting return, the application will continue and write the changes to the table, using an automatic transaction to perform the update.</p>
<p><strong>Main.java (part 6):</strong></p>
<pre><span>session.updatePersistent(theEmployee);</span></pre>
<pre><span>System.out.println("Check the change in the table before I bulk add
Employees - hit return when you are done");
ignore = br.readLine();</span></pre>
<p>The application will again pause so that we can now check that the change has been written back (persisted) to the database:</p>
<pre><span>mysql&gt; select * from clusterdb.employee;
+-----+-------+-------+--------------+-----------------+-------+------------+
| id  | first | last  | municipality | started         | ended | department |
+-----+-------+-------+--------------+-----------------+-------+------------+
| 988 | John  | Jones | London       | 1 February 2009 | NULL  |        777 |
+-----+-------+-------+--------------+-----------------+-------+------------+</span></pre>
<p>The application then goes onto create and persist 100 new employees. To improve performance, a single transaction is used to that all of the changes can be written to the database at once when the commit() statement is run:</p>
<p><strong>Main.java (part 7):</strong></p>
<pre><span>// Add 100 new Employees - all as part of a single transaction
 newEmployee.setFirst("Billy");
 newEmployee.setStarted("28 February 2009");</span></pre>
<pre><span>session.currentTransaction().begin();</span></pre>
<pre><span>for (int i=700;i&lt;800;i++) {
 newEmployee.setLast("No-Mates"+i);
 newEmployee.setId(i+1000);
 newEmployee.setDepartment(i);
 session.persist(newEmployee);
 }</span></pre>
<pre><span>session.currentTransaction().commit();</span></pre>
<p>The 100 new employees will now have been persisted to the database. The next step is to create and execute a query that will search the database for all employees in department 777 by using a QueryBuilder and using that to build a QueryDomain that compares the ‘department’ column with a parameter. After creating the, the department parameter is set to 777 (the query could subsequently be reused with different department numbers). The application then runs the query and iterates through and displays each of employees in the result set:</p>
<p><strong>Main.java (part 8):</strong></p>
<pre><span>// Retrieve the set all of Employees in department 777
QueryBuilder builder = session.getQueryBuilder();
QueryDomainType&lt;Employee&gt; domain =
builder.createQueryDefinition(Employee.class);
domain.where(domain.get("department").equal(domain.param(
"department")));
Query&lt;Employee&gt; query = session.createQuery(domain);
query.setParameter("department",777);</span></pre>
<pre><span>List&lt;Employee&gt; results = query.getResultList();
for (Employee deptEmployee: results) {
System.out.println ("ID: " + deptEmployee.getId() + "; Name: " +
deptEmployee.getFirst() + " " + deptEmployee.getLast());
System.out.println ("Location: " + deptEmployee.getCity());
System.out.println ("Department: " + deptEmployee.getDepartment());
System.out.println ("Started: " + deptEmployee.getStarted());
System.out.println ("Left: " + deptEmployee.getEnded());
}</span></pre>
<pre><span>System.out.println("Last chance to check database before emptying table
- hit return when you are done");
ignore = br.readLine();</span></pre>
<p>At this point, the application will display the following and prompt the user to allow it to continue:</p>
<pre><span>ID: 988; Name: John Jones
Location: London
Department: 777
Started: 1 February 2009
Left: null
ID: 1777; Name: Billy No-Mates777
Location: null
Department: 777
Started: 28 February 2009
Left: null</span></pre>
<p>We can compare that output with an SQL query performed on the database:</p>
<pre><span>mysql&gt; select * from employee where department=777;
 +------+-------+-------------+--------------+------------------+-------+------------+
 | id   | first | last        | municipality | started          | ended | department |
 +------+-------+-------------+--------------+------------------+-------+------------+
 |  988 | John  | Jones       | London       | 1 February 2009  | NULL  |        777 |
 | 1777 | Billy | No-Mates777 | NULL         | 28 February 2009 | NULL  |        777 |
 +------+-------+-------------+--------------+------------------+-------+------------+</span></pre>
<p>Finally, after pressing return again, the application will remove all employees:</p>
<p><strong>Main.java (part 9):</strong></p>
<pre><span>session.deletePersistentAll(Employee.class);
 }
}</span></pre>
<p>As a final check, an SQL query confirms that all of the rows have been deleted from the ‘employee’ table.</p>
<pre><span>mysql&gt; select * from employee;
Empty set (0.00 sec)</span></pre>
<h4>Compiling and running the ClusterJ tutorial code</h4>
<pre><span>javac -classpath /usr/local/mysql/share/mysql/java/clusterj-api.jar:. Main.java Employee.java</span></pre>
<pre><span>java -classpath /usr/local/mysql/share/mysql/java/clusterj.jar:. -Djava.library.path=/usr/local/mysql/lib Main
</span><span> </span></pre>
<p><a href="http://www.clusterdb.com/ClusterJ_Examples.tar.gz" target="_blank">Download the source code for this tutorial from here </a>(together with the code for the up-coming ClusterJPA tutorial).</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24094&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24094&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Andrew Morgan";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:1;a:6:{s:4:"data";s:48:"
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:49:"MySQL absent from Google Summer of Code this year";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:25:"265 at http://openlife.cc";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:72:"http://openlife.cc/blogs/2010/march/mysql-absent-google-summer-code-year";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:202:"Google Summer of Code is now open for student applications and people (like Ronald Bradford) are noticing that MySQL is not participating this year. (Drizzle otoh is a mentoring organization.)
read more";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 30 Mar 2010 11:22:13 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:2:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:11:"Open Source";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:739:"<p><a href="http://socghop.appspot.com/">Google Summer of Code</a> is now open for student applications and people (like <a href="http://ronaldbradford.com/blog/how-to-find-mysql-developers-2010-03-24/">Ronald Bradford</a>) are noticing that MySQL is not participating this year. (<a href="http://www.joinfu.com/2010/03/holy-google-summer-of-code-batman/">Drizzle otoh is a mentoring organization</a>.)</p>
<p><a href="http://openlife.cc/blogs/2010/march/mysql-absent-google-summer-code-year" target="_blank">read more</a></p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24093&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24093&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Henrik Ingo";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:2;a:6:{s:4:"data";s:58:"
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:60:"Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-1528820875793831782.post-7521867821875225744";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:83:"http://samurai-mysql.blogspot.com/2010/03/not-only-nosql-uber-scaling-out-with.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:5731:"The history tells that a single RDBMS node cannot handle tons of traffics on web system which come from all over the world, no matter how the database is tuned. MySQL has implemented a master/slave style replication built-in for long time, and it has enabled web applications to handle traffics using a scale-out strategy. Having many slaves has been suitable for web sites where most of traffics are reads. Thus, MySQL's master/slave replication has been used on many web sites, and is being used still.However, when a site grow large, amount of traffic may exceed the replication's capacity. In such a case, people may use memcached. It's an in-memory, very fast and well-known KVS, key value store, and its read throughput is far better than MySQL. It's been used as a cache for web applications to store 'hot' data with MySQL as a back-end storage, as it can reduce read requests to MySQL dramatically.While 1:N replication can scale read workload and memcached can reduce read requests, it cannot ease write load well. So, write traffic gets higher and higher when a web site becomes huge. On such web sites, a technique called "Sharding" has been used; it's a technique that the application choose an appropriate MySQL server from several servers.In that way, MySQL+memcached has been a de-fact standard data store on huge web sites for long time.Since web applications are getting larger still, especially on social media sites, write load is getting higher and higher as people communicate in real-time. In such area, yet another technique is required to handle the write load. Then, some people have chosen NoSQL solutions instead of MySQL+memcached. NoSQL is a kind of buzz word, IMHO, which represents non-relational databases which doesn't require SQL access. Despite lack of SQL access, some NoSQL softwares are suitable for huge scale web applications, like Cassandra. Although people cannot JOIN records on NoSQL system, it is not possible on RDBMS over the shards as well. So, MySQL isn't used as a RDBMS, is used as a data store without joins in other words, on such a web application in the first place.For further information of this kind of thoughts, I recommend you to read Mark Calleghan's post: http://mysqlha.blogspot.com/2010/03/plays-well-with-others.htmland this post: http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-kingTechnically, it is possible to handle huge amount of traffics using MySQL, but a running cost gets expensive, Twitter says. As these techniques are separate ones, so those people have to spent their time to learn all of three who implement the application over them and manage them. On the other hand, Cassandra can handle more traffics as a single database management system, so people only have to learn it instead of three. Sounds great? But, is it a really good choice?No! They're not aware of yet another solution, say SPIDER storage engine!SPIDER for MySQLhttp://spiderformysql.com/SPIDER is a storage engine developed by a Japanese MySQL hacker, Mr. Kentoku Shiba, it makes use of MySQL's partitioning functionality and store partitioned data onto remote servers. I may say it's a Sharding storage engine. While flexibility of MySQL's storage engine API enables such an engine, but I value Kentoku's design a lot.The following picture depicts how SPIDER storage engine works. (This is a snippet from the site above.)In this entry, I do not explain how to use SPIDER storage engine, but I tell you how great its ability is. If you want to try it out, please refer to Giuseppe Maxia's post.Please look at the following graph, which represents an INSERT performance comparing a single MySQL server (InnoDB), 2 SPIDER node + 2 backend MySQL server and 4 SPIDER node + 4 backend MySQL Server. You can see how good it scales.The next graph is a SELECT performance. Read scales pretty good as well.Red circles indicate where working set sizes exceed memory sizes. While performance drops when a working set size exceeds the available memory size, SPIDER is able to expand the memory so that a working set fits in it. SPIDER can make use of memory on all remote servers, as if there is a huge buffer pool in total.For more information about SPIDER's performance test, please refer to Kentoku's slide. It's surprising.The most significant problem for twitter is to scale out read/write load with less running cost. Unfortunately, they had chosen NoSQL solution due to the fact that "MySQL replication + memcached + sharding" cannot handle write intensive workload well. However, such a problem can be resolved using SPIDER storage engine with MySQL!Generally, KVS cannot solve certain problems like below:JOINSort (ORDER BY)Aggregation (GROUP BY)When using KVS, these problems can be handled using MapReduce, however, we can process the same task using a very simple SQL in general. Thus, SQL allows us to develop a complex logic very efficiently. When I ask Kentoku permission to write an article about his storage engine, he told me his philosophy like below:I think that the most significant benefit to use RDB is its usefulness and flexibility. It is a very important characteristic for developers in order to keep the application competitive, especially for those developers who have to add new features/functionalities day by day, like web services. I develop SPIDER storage engine in order to provide developers such useful and flexible RDB's characteristics, even on the environment where the traffic and data is huge thus Sharding is required.I 100% agree with his opinion. If you are facing the problem caused by high traffic and huge data just like twitter, please consider to use SPIDER storage engine before migrating to NoSQL solutions.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 30 Mar 2010 09:58:24 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:6:"spider";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:7:"twitter";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:8:"sharding";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:7897:"The history tells that a single RDBMS node cannot handle tons of traffics on web system which come from all over the world, no matter how the database is tuned. MySQL has implemented a master/slave style replication built-in for long time, and it has enabled web applications to handle traffics using a scale-out strategy. Having many slaves has been suitable for web sites where most of traffics are reads. Thus, MySQL's master/slave replication has been used on many web sites, and is being used still.<br /><br />However, when a site grow large, amount of traffic may exceed the replication's capacity. In such a case, people may use memcached. It's an in-memory, very fast and well-known KVS, key value store, and its read throughput is far better than MySQL. It's been used as a cache for web applications to store 'hot' data with MySQL as a back-end storage, as it can reduce read requests to MySQL dramatically.<br /><br />While 1:N replication can scale read workload and memcached can reduce read requests, it cannot ease write load well. So, write traffic gets higher and higher when a web site becomes huge. On such web sites, a technique called "Sharding" has been used; it's a technique that the application choose an appropriate MySQL server from several servers.<br /><br />In that way, MySQL+memcached has been a de-fact standard data store on huge web sites for long time.<br /><a name='more'></a><br />Since web applications are getting larger still, especially on social media sites, write load is getting higher and higher as people communicate in real-time. In such area, yet another technique is required to handle the write load. Then, some people have chosen NoSQL solutions instead of MySQL+memcached. NoSQL is a kind of buzz word, IMHO, which represents non-relational databases which doesn't require SQL access. Despite lack of SQL access, some NoSQL softwares are suitable for huge scale web applications, like Cassandra. Although people cannot JOIN records on NoSQL system, it is not possible on RDBMS over the shards as well. So, MySQL isn't used as a RDBMS, is used as a data store without joins in other words, on such a web application in the first place.<br /><br />For further information of this kind of thoughts, I recommend you to read Mark Calleghan's post: <a href="http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html">http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html</a><br />and this post: <a href="http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king">http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king</a><br /><br />Technically, it is possible to handle huge amount of traffics using MySQL, but a running cost gets expensive, Twitter says. As these techniques are separate ones, so those people have to spent their time to learn all of three who implement the application over them and manage them. On the other hand, <a href="http://en.wikipedia.org/wiki/Apache_Cassandra">Cassandra</a> can handle more traffics as a single database management system, so people only have to learn it instead of three. Sounds great? But, is it a really good choice?<br /><br />No! They're not aware of yet another solution, say <span>SPIDER storage engine!</span><br /><br /><span>SPIDER for MySQL</span><br /><a href="http://spiderformysql.com/">http://spiderformysql.com/</a><br /><br />SPIDER is a storage engine developed by a Japanese MySQL hacker, Mr. Kentoku Shiba, it makes use of MySQL's partitioning functionality and store partitioned data onto remote servers. I may say it's a Sharding storage engine. While flexibility of MySQL's storage engine API enables such an engine, but I value Kentoku's design a lot.<br /><br />The following picture depicts how SPIDER storage engine works. (This is a snippet from the site above.)<br /><div><a href="http://3.bp.blogspot.com/_3l-X4JQ1EX4/S6f4IHWlpdI/AAAAAAAAATk/_e4oLASegOE/s1600-h/Screen+shot+2010-03-23+at+7.50.38+AM.png" imageanchor="1"><img border="0" height="320" src="http://3.bp.blogspot.com/_3l-X4JQ1EX4/S6f4IHWlpdI/AAAAAAAAATk/_e4oLASegOE/s320/Screen+shot+2010-03-23+at+7.50.38+AM.png" width="293" /></a></div>In this entry, I do not explain how to use SPIDER storage engine, but I tell you how great its ability is. If you want to try it out, please refer to <a href="http://datacharmer.blogspot.com/2009/04/test-driving-spider-storage-engine.html">Giuseppe Maxia's post.</a><br /><br />Please look at the following graph, which represents an INSERT performance comparing a single MySQL server (InnoDB), 2 SPIDER node + 2 backend MySQL server and 4 SPIDER node + 4 backend MySQL Server. You can see how good it scales.<br /><div><a href="http://4.bp.blogspot.com/_3l-X4JQ1EX4/S6f4PT3tg_I/AAAAAAAAATs/64Rrf9vgEmw/s1600-h/Screen+shot+2010-03-23+at+7.40.44+AM.png" imageanchor="1"><img border="0" height="443" src="http://4.bp.blogspot.com/_3l-X4JQ1EX4/S6f4PT3tg_I/AAAAAAAAATs/64Rrf9vgEmw/s640/Screen+shot+2010-03-23+at+7.40.44+AM.png" width="640" /></a></div>The next graph is a SELECT performance. Read scales pretty good as well.<br /><div><a href="http://1.bp.blogspot.com/_3l-X4JQ1EX4/S6f4Q_aURtI/AAAAAAAAAT0/itunj7dOCjI/s1600-h/Screen+shot+2010-03-23+at+7.43.34+AM.png" imageanchor="1"><img border="0" height="448" src="http://1.bp.blogspot.com/_3l-X4JQ1EX4/S6f4Q_aURtI/AAAAAAAAAT0/itunj7dOCjI/s640/Screen+shot+2010-03-23+at+7.43.34+AM.png" width="640" /></a></div>Red circles indicate where working set sizes exceed memory sizes. While performance drops when a working set size exceeds the available memory size, SPIDER is able to expand the memory so that a working set fits in it. SPIDER can make use of memory on all remote servers, as if there is a huge buffer pool in total.<br /><br />For more information about SPIDER's performance test, please refer to <a href="http://www.slideshare.net/Kentoku/spider-performance-testbench-mark04242009">Kentoku's slide</a>. It's surprising.<br /><br />The most significant problem for twitter is to scale out read/write load with less running cost. Unfortunately, they had chosen NoSQL solution due to the fact that "MySQL replication + memcached + sharding" cannot handle write intensive workload well. However, such a problem can be resolved using SPIDER storage engine with MySQL!<br /><br />Generally, KVS cannot solve certain problems like below:<br /><ul><li>JOIN</li><li>Sort (ORDER BY)</li><li>Aggregation (GROUP BY)</li></ul>When using KVS, these problems can be handled using MapReduce, however, we can process the same task using a very simple SQL in general. Thus, SQL allows us to develop a complex logic very efficiently. When I ask Kentoku permission to write an article about his storage engine, he told me his philosophy like below:<br /><blockquote>I think that the most significant benefit to use RDB is its usefulness and flexibility. It is a very important characteristic for developers in order to keep the application competitive, especially for those developers who have to add new features/functionalities day by day, like web services. I develop SPIDER storage engine in order to provide developers such useful and flexible RDB's characteristics, even on the environment where the traffic and data is huge thus Sharding is required.<br /></blockquote>I 100% agree with his opinion. If you are facing the problem caused by high traffic and huge data just like twitter, please consider to use SPIDER storage engine before migrating to NoSQL solutions.<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/1528820875793831782-7521867821875225744?l=samurai-mysql.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24092&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24092&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:12:"Mikiya Okuno";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:3;a:6:{s:4:"data";s:58:"
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:46:"MySQL scale-out with replication and Pacemaker";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:34:"http://fghaas.wordpress.com/?p=372";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:86:"http://fghaas.wordpress.com/2010/03/30/mysql-scale-out-with-replication-and-pacemaker/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:999:"It seems that the MySQL Conference selection committee didn&#8217;t seem to like the talk (and tutorial) I submitted about integrating MySQL Replication with the Pacemaker cluster stack, enabling full MySQL scale-out in an environment you previously knew only for its synchronous-replication High Availability features. But &#8212; fear not, my fellow HA geeks! &#8212; we are instead making the talk available for free, over the web, one week ahead of the conference.
In this 45-minute presentation, we will give an overview about

the Pacemaker cluster stack and the Linux-HA project it evolved out of;
the current state of MySQL integration in Pacemaker, and of course
leveraging Pacemaker&#8217;s built-in master-slave clustering infrastructure for MySQL Replication.

It&#8217;s pretty cool stuff that the Linux-HA community has put together there.
Join us for this webcast next Wednesday, April 7, at 1500 UTC!
Registration is free. Joining the webinar requires a Java capable browser.
       ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 30 Mar 2010 09:31:50 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:9:"Heartbeat";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:8:"Linux-HA";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:9:"Pacemaker";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2580:"<p>It seems that the MySQL Conference selection committee didn&#8217;t seem to like the talk (and tutorial) I submitted about integrating MySQL Replication with the <a href="http://www.clusterlabs.org">Pacemaker</a> cluster stack, enabling full MySQL scale-out in an environment you <a href="http://www.mysql.com/drbd">previously knew only for its synchronous-replication High Availability</a> features. But &#8212; fear not, my fellow HA geeks! &#8212; we are instead making the talk available for free, over the web, one week ahead of the conference.</p>
<p><span></span>In this 45-minute presentation, we will give an overview about</p>
<ul>
<li>the Pacemaker cluster stack and the Linux-HA project it evolved out of;</li>
<li>the current state of MySQL integration in Pacemaker, and of course</li>
<li>leveraging Pacemaker&#8217;s built-in master-slave clustering infrastructure for MySQL Replication.</li>
</ul>
<p>It&#8217;s pretty cool stuff that the Linux-HA community has put together there.</p>
<p><a href="https://linbit.webex.com/linbit-en/onstage/g.php?t=a&amp;d=847289372">Join us for this webcast next Wednesday, April 7, at 1500 UTC!</a></p>
<p>Registration is free. Joining the webinar requires a Java capable browser.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fghaas.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fghaas.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fghaas.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fghaas.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fghaas.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fghaas.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fghaas.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fghaas.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fghaas.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fghaas.wordpress.com/372/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fghaas.wordpress.com&amp;blog=1182330&amp;post=372&amp;subd=fghaas&amp;ref=&amp;feed=1" /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24091&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24091&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Florian G. Haas";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:4;a:6:{s:4:"data";s:108:"
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:44:"MySQL: Partition-wise backups with mysqldump";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:59:"tag:blogger.com,1999:blog-15319370.post-6531770173905277751";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:75:"http://rpbouman.blogspot.com/2010/03/mysql-partition-wise-backups-with.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3120:"To whom it may concern,in response to a query from André Simões (also known as ITXpander), I slapped together a MySQL script that outputs mysqldump commands for backing up individual partitions of the tables in the current schema. The script is maintained as a snippet at MySQL Forge. How it worksThe script works by querying the information_schema.PARTITIONS system view to generate an appropriate expression for mysqldump's --where option. The generated command also redirects the output to a file with this name pattern:&lt;schema&gt;.&lt;table&gt;.&lt;partition-name&gt;.sqlFor example, for this table (taken from the MySQL reference manual):CREATE TABLE members (    firstname VARCHAR(25) NOT NULL,    lastname VARCHAR(25) NOT NULL,    username VARCHAR(16) NOT NULL,    email VARCHAR(35),    joined DATE NOT NULL)PARTITION BY RANGE( YEAR(joined) ) (    PARTITION p0 VALUES LESS THAN (1960),    PARTITION p1 VALUES LESS THAN (1970),    PARTITION p2 VALUES LESS THAN (1980),    PARTITION p3 VALUES LESS THAN (1990),    PARTITION p4 VALUES LESS THAN MAXVALUE); the script generates the following commands:mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) &lt; 1960" test members &gt; test.members.p0.sqlmysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) &gt;= 1960 and  YEAR(joined) &lt; 1970" test members &gt; test.members.p1.sqlmysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) &gt;= 1970 and  YEAR(joined) &lt; 1980" test members &gt; test.members.p2.sqlmysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) &gt;= 1980 and  YEAR(joined) &lt; 1990" test members &gt; test.members.p3.sqlmysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) &gt;= 1990 and  YEAR(joined) &lt; 18446744073709551615" test members &gt; test.members.p4.sqlTip: in order to obtain directly executable output from the mysql command line tool, run the script with the --skip-column-names (or -N) option.FeaturesCurrently, the script supports the following partitioning methods:HASHLISTRANGELimitationsThe LINEAR HASH method is currently not supported, but I may implement that in the future. Currently I do not have plans to implement the KEY and LINEAR KEY partitioning methods, but I may reconsider if and when I have more information about the storage-engine specific partitioning methods used by these methods.Finally, I should point out that querying the information_schema.PARTITIONS table is dog-slow. This may not be too big of an issue, however it is pretty annoying. If anybody has some tips to increase performance, please let me know.AcknowledgementsThanks to André for posing the problem. I had a fun hour of procrastination to implement this, and it made me read part of the MySQL reference manual on partitioning. I also would like to thank Giuseppe Maxia (the Datacharmer) for providing valuable feedback. If you're interested in either partitioning or the mysql command line, you should visit his tutorials at the MySQL conference, april 12-15, 2010.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 30 Mar 2010 09:00:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:14:{i:0;a:5:{s:4:"data";s:12:"partitioning";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:10:"MySQLForge";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:9:"mysqlconf";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:3:"DBA";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:23:"MySQL command line tool";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:18:"information_schema";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:16:"MySQL Conference";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:14:"Giuseppe Maxia";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:8;a:5:{s:4:"data";s:9:"mysqldump";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:9;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:10;a:5:{s:4:"data";s:14:"André Simões";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:11;a:5:{s:4:"data";s:6:"backup";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:12;a:5:{s:4:"data";s:25:"MySQL command line client";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:13;a:5:{s:4:"data";s:8:"MySQL UC";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:5749:"To whom it may concern,<br /><br />in response to <a target="andre" href="http://twitter.com/ITXpander/status/11257597174">a query</a> from André Simões (also known as <a href="http://itxpander.wordpress.com/" target="andre">ITXpander</a>), I slapped together a MySQL script that outputs <code><a href="http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html" target="mysql">mysqldump</a></code> commands for backing up individual partitions of the tables in the current schema. The script is maintained as <a target="mysqlforge" href="http://forge.mysql.com/tools/tool.php?id=258">a snippet</a> at <a href="http://rpbouman.blogspot.com/" target="mysqlforge">MySQL Forge</a>. <h3>How it works</h3>The script works by querying the <code><a href="http://dev.mysql.com/doc/refman/5.1/en/partitions-table.html" target="mysql">information_schema.PARTITIONS</a></code> system view to generate an appropriate expression for mysqldump's <code><a href="http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_where" target="mysql">--where</a></code> option. The generated command also redirects the output to a file with this name pattern:<pre>&lt;schema&gt;.&lt;table&gt;.&lt;partition-name&gt;.sql</pre>For example, for this table (<a href="http://dev.mysql.com/doc/refman/5.1/en/partitioning-types.html" target="mysql">taken from</a> the MySQL reference manual):<pre>CREATE TABLE members (<br />    firstname VARCHAR(25) NOT NULL,<br />    lastname VARCHAR(25) NOT NULL,<br />    username VARCHAR(16) NOT NULL,<br />    email VARCHAR(35),<br />    joined DATE NOT NULL<br />)<br />PARTITION BY RANGE( YEAR(joined) ) (<br />    PARTITION p0 VALUES LESS THAN (1960),<br />    PARTITION p1 VALUES LESS THAN (1970),<br />    PARTITION p2 VALUES LESS THAN (1980),<br />    PARTITION p3 VALUES LESS THAN (1990),<br />    PARTITION p4 VALUES LESS THAN MAXVALUE<br />);</pre> the script generates the following commands:<pre>mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) &lt; 1960" test members &gt; test.members.p0.sql<br />mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) &gt;= 1960 and  YEAR(joined) &lt; 1970" test members &gt; test.members.p1.sql<br />mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) &gt;= 1970 and  YEAR(joined) &lt; 1980" test members &gt; test.members.p2.sql<br />mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) &gt;= 1980 and  YEAR(joined) &lt; 1990" test members &gt; test.members.p3.sql<br />mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) &gt;= 1990 and  YEAR(joined) &lt; 18446744073709551615" test members &gt; test.members.p4.sql</pre>Tip: in order to obtain directly executable output from the <code><a href="http://dev.mysql.com/doc/refman/5.1/en/mysql.html" target="mysql">mysql</a></code> command line tool, run the script with the <code><a href="http://dev.mysql.com/doc/refman/5.1/en/mysql-command-options.html#option_mysql_skip-column-names" target="mysql">--skip-column-names</a></code> (or <code>-N</code>) option.<h3>Features</h3>Currently, the script supports the following <a href="http://dev.mysql.com/doc/refman/5.1/en/partitioning-types.html" target="mysql">partitioning methods</a>:<ul><br /><li><code><a target="mysql" href="http://dev.mysql.com/doc/refman/5.1/en/partitioning-hash.html">HASH</a></code></li><br /><li><code><a target="mysql" href="http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html">LIST</a></code></li><br /><li><code><a target="mysql" href="http://dev.mysql.com/doc/refman/5.1/en/partitioning-range.html">RANGE</a></code></li><br /></ul><h3>Limitations</h3>The <code><a target="mysql" href="http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html">LINEAR HASH</a></code> method is currently not supported, but I may implement that in the future. <br /><br />Currently I do not have plans to implement the <code><a target="mysql" href="http://dev.mysql.com/doc/refman/5.1/en/partitioning-key.html">KEY</a></code> and <code><a target="mysql" href="http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-key.html">LINEAR KEY</a></code> partitioning methods, but I may reconsider if and when I have more information about the storage-engine specific partitioning methods used by these methods.<br /><br />Finally, I should point out that querying the <code>information_schema.PARTITIONS</code> table is dog-slow. This may not be too big of an issue, however it is pretty annoying. If anybody has some tips to increase performance, please let me know.<h3>Acknowledgements</h3>Thanks to André for posing the problem. I had a fun hour of procrastination to implement this, and it made me read part of the <a href="http://dev.mysql.com/doc/refman/5.1/en/partitioning.html" target="mysql">MySQL reference manual on partitioning</a>. <br /><br />I also would like to thank Giuseppe Maxia (<a href="http://datacharmer.blogspot.com/" target="giuseppe">the Datacharmer</a>) for providing valuable feedback. If you're interested in either partitioning or the mysql command line, you should visit <a href="http://en.oreilly.com/mysql2010/public/schedule/speaker/32" target="mysqlconf">his tutorials</a> at the <a href="http://en.oreilly.com/mysql2010/" target="mysqlconf">MySQL conference</a>, april 12-15, 2010.<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/15319370-6531770173905277751?l=rpbouman.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24090&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24090&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Roland Bouman";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:5;a:6:{s:4:"data";s:58:"
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:44:"Storing the table message in Embedded InnoDB";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:40:"http://www.flamingspork.com/blog/?p=1874";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:89:"http://www.flamingspork.com/blog/2010/03/30/storing-the-table-message-in-embedded-innodb/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1671:"One of the exciting things[1] about working on a storage engine in Drizzle is that you get to manage your own metadata. When the database engine you&#8217;re writing the storage engine interface for has a pretty complete data dictionary (e.g. Embedded InnoDB) you could just directly use it. At some point I plan to do this for the embedded_innodb engine for Drizzle so that you could just point Drizzle at an existing Embedded InnoDB database and run SQL queries on it.
The Drizzle table message does have some things in it that aren&#8217;t in the InnoDB data dictionary though (e.g. table and column comments). We want to preserve these (and also things like there may be several data types in Drizzle that map to the same data type in InnoDB). Since the Embedded InnoDB API allows us to do things within the DDL transaction (such as insert a row into a table), we store the serialized table message in a table as part of the DDL transaction. This means we can have fully crash safe DDL! There is no way the table definition can get out of sync with what is in InnoDB; we are manipulating them both in the same transaction!
The table structure we&#8217;re using is pretty simple. There is two columns: table_name VARCHAR(IB_MAX_TABLE_NAME_LEN) and message BLOB.
The operations we need are:

store the table message in doCreateTable (INSERT)
rename the table message in doRenameTable (UPDATE the table_name column)
delete the table message in doDropTable (DELETE)
list tables in a database (SELECT with prefix)
get table message (SELECT using key lookup)

All of which are pretty easy to implement using the Embedded InnoDB API.
[1] Maybe I need to get out more&#8230;.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 30 Mar 2010 05:06:17 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:7:"drizzle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:15:"embedded_innodb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:6:"innodb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:13:"table message";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2095:"<p>One of the exciting things[1] about working on a storage engine in <a href="http://drizzle.org">Drizzle</a> is that you get to manage your own metadata. When the database engine you&#8217;re writing the storage engine interface for has a pretty complete data dictionary (e.g. <a href="http://www.innodb.com/wp/products/embedded-innodb/">Embedded InnoDB</a>) you could just directly use it. At some point I plan to do this for the embedded_innodb engine for Drizzle so that you could just point Drizzle at an existing Embedded InnoDB database and run SQL queries on it.</p>
<p>The Drizzle table message does have some things in it that aren&#8217;t in the InnoDB data dictionary though (e.g. table and column comments). We want to preserve these (and also things like there may be several data types in Drizzle that map to the same data type in InnoDB). Since the Embedded InnoDB API allows us to do things within the DDL transaction (such as insert a row into a table), we store the serialized table message in a table as part of the DDL transaction. This means we can have <strong>fully crash safe DDL</strong>! There is no way the table definition can get out of sync with what is in InnoDB; we are manipulating them both in the same transaction!</p>
<p>The table structure we&#8217;re using is pretty simple. There is two columns: table_name VARCHAR(IB_MAX_TABLE_NAME_LEN) and message BLOB.</p>
<p>The operations we need are:</p>
<ul>
<li>store the table message in doCreateTable (INSERT)</li>
<li>rename the table message in doRenameTable (UPDATE the table_name column)</li>
<li>delete the table message in doDropTable (DELETE)</li>
<li>list tables in a database (SELECT with prefix)</li>
<li>get table message (SELECT using key lookup)</li>
</ul>
<p>All of which are pretty easy to implement using the Embedded InnoDB API.</p>
<p>[1] Maybe I need to get out more&#8230;.</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24089&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24089&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Stewart Smith";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:6;a:6:{s:4:"data";s:33:"
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:2:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:54:"MySQL 5.1 performance for a CPU-bound, update workload";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:53:"http://www.facebook.com/note.php?note_id=380695545932";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:53:"http://www.facebook.com/note.php?note_id=380695545932";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:6018:"Using the configuration described elsewhere I ran a modified version of sysbench to compare PBXT, InnoDB and MyISAM using two update intensive workloads on a CPU-bound server. The workloads are simple but reproduce a bottleneck that is intermittent on some of our MySQL servers. Each workload is one statement run by many concurrent connections. In both cases the statements increment a column in one very popular row. One test uses an update statement like  UPDATE sbtest set c=c+1 where id=1. The other test uses an insert statement like INSERT INTO sbtest values(1,0,'0','ayyyyyyyyy') ON DUPLICATE KEY UPDATE c = c + 1 and always executes the ON DUPLICATE KEY UPDATE clause.

Note that this is a very narrow benchmark (nothing but updates, CPU bound server) but it tests a case that I care about. My summary of the results are:

There is significant overhead from using the binlog with sync_binlog=1 even when fsync is fast. We are trying to fix this by supporting group commit for the binlog fsync. Ryan McElroy is working on it and will talk about this at the MySQL conference. The MySQL replication and MariaDB teams are also working on the problem. We hope to learn that the InnoDB team is also working on a fix.
InnoDB is much faster when deadlock detection is disabled for workloads with a lot of concurrency and contention.
PBXT continues to be competitive.
PBXT needs to get better at INSERT ON DUPLICATE KEY UPDATE.
MyISAM is faster. This doesn't surprise me. I am surprised that InnoDB is very close to MyISAM on some of the tests.

The tests were run for MySQL 5.1.45 with the Facebook patch for MySQL 5.1 using the PBXT 1.1, InnoDB plugin 1.0.6 and MyISAM engines. Several configurations were run:

binlog enabled, innodb deadlock detection disabled/enabled
binlog disabled, innodb deadlock detection disabled/enabled
binlog disabled, innodb deadlock detection disabled/disabled, fsync on commit disabled

The Facebook patch for MySQL includes a change that signficantly reduces the overhead from InnoDB deadlock detection courtesy of MySQL support. This patch will soon be in an official release. I don't include results without the patch. They are much worse. Even with this patch the overhead from InnoDB deadlock detection can be significant.

With the configuration below the database is cached. The server uses HW RAID with a battery backed write cache so O_DIRECT writes and fsync are fast. The basic my.cnf values are:

pbxt_index_cache_size=250M
pbxt_record_cache_size=750M
pbxt_flush_log_at_trx_commit=1
innodb_buffer_pool_size=1000M
innodb_log_file_size=100M
innodb_flush_log_at_trx_commit=1
innodb_doublewrite=0
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=80
innodb_file_per_table
innodb_file_format=barracuda
max_connections=2000
table_cache=2000
key_buffer_size=1000M

The value innodb_deadlock_detect=0 was added for the tests that disabled InnoDB deadlock detection. For the tests in which fsync on commit was disabled these values were appended to my.cnf: innodb_flush_log_at_trx_commit=2 and pbxt_flush_log_at_trx_commit. For tests that used the binlog it was enabled by the following options:

log_bin
binlog_format=row
innodb_autoinc_lock_mode=2
sync_binlog=1

For all of the results:

innodb-ndl is the 1.0.6 InnoDB plugin with innodb_deadlock_detect=0 used to disable InnoDB deadlock detection. Row lock timeouts would still resolve deadlocks, but only after the timeout expires. This option is in the Facebook patch for MySQL 5.1.
All results are reported for 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 concurrent clients.
The mysqld server and sysbench clients ran on the same 16 core server.
Results are transactions per second (TPS) reported by sysbench where each transaction is one UPDATE or INSERT statement.

Results for concurrent UPDATE

The statement was UPDATE sbtest set c=c+1 where id=1

TPS with the binlog enabled:

innodb-ndl 1263   2076   2373   2556   2448   2537   2584   2483   2350   1949   1495
innodb     1633   2313   2395   2522   2492   2498   2360   1457    576    155     43
pbxt       1626   2113   2214   2250   2217   2149   2008   1906   1689   1368   1088
myisam     2725   3844   3767   3880   4129   3947   4523   3857   3906   3876   4021

TPS with the binlog disabled:

innodb-ndl 6453  11650  16576  12242  12672  12431  12044  10691   7652   4747   2603
innodb     5046  11760  16710  11775  12299  10348   5898   2173    674    165     50
pbxt       6775  12982  15101  11033  10545   9175   8155   6292   4512   2885   1809
myisam     9186  12931   9638   9319   9720   9515   8851   9529   8680   7827   8691

Results for concurrent INSERT ON DUPLICATE KEY UPDATE

The statement was INSERT INTO sbtest values(1,0,'0','ayyyyyyyyy') ON DUPLICATE KEY UPDATE c = c + 1. The value 0 is listed for some results from PBXT because of bug 551250.

TPS with the binlog enabled:

innodb-ndl 1502   2300   2483   2205   2563   2614   2436   2381   2260   1934   1507
innodb     1262   2481   2678   2547   2306   2433   2225   1483    594    163     43
pbxt       1643   2132   1925   1832   1548   1291    895    557      0      0      0
myisam     2573   4145   4088   3704   4043   4442   4611   4694   4280   4265   4037

TPS with the binlog disabled:

innodb-ndl 2704   4990   9772  13543  12109  11747  11123   9706   7316   4696   2480
innodb     2271   5878   9488  13494  11631   9760   5754   2277    700    172     56
pbxt       2112   4167   2581   1914   1170     46      0      0      0      0      0
myisam     9350  15388  11897  10484  11235  11178  11036  10405  10008   8727  10829

TPS with the binlog disabled and fsync on commit disabled:

innodb-ndl 6092  11126  14765  11054  11308  10975  10854   9424   7165   4491   2524
innodb     5908  11379  14585  10734  10891   9372   5554   2152    680    170     45
pbxt       5289      0      0      0      0      0   5036   2681   2238    646    153
myisam     9350  15388  11897  10484  11235  11178  11036  10405  10008   8727  10829
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Tue, 30 Mar 2010 03:26:09 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:6892:"Using the configuration <a href="http://www.facebook.com/notes/mysqlfacebook/pbxt-still-looks-good/379934640932">described elsewhere</a> I ran a <a href="http://code.launchpad.net/~mdcallag/sysbench/0.4-incupdate">modified version of sysbench</a> to compare PBXT, InnoDB and MyISAM using two update intensive workloads on a CPU-bound server. The workloads are simple but reproduce a bottleneck that is intermittent on some of our MySQL servers. Each workload is one statement run by many concurrent connections. In both cases the statements increment a column in one very popular row. One test uses an update statement like  <b>UPDATE sbtest set c=c+1 where id=1</b>. The other test uses an insert statement like <b>INSERT INTO sbtest values(1,0,'0','ayyyyyyyyy') ON DUPLICATE KEY UPDATE c = c + 1</b> and always executes the <b>ON DUPLICATE KEY UPDATE</b> clause.

Note that this is a very narrow benchmark (nothing but updates, CPU bound server) but it tests a case that I care about. My summary of the results are:
<ul>
<li>There is significant overhead from using the binlog with <b>sync_binlog=1</b> even when fsync is fast. We are trying to fix this by supporting group commit for the binlog fsync. Ryan McElroy is working on it and will talk about this <a href="http://en.oreilly.com/mysql2010/public/schedule/speaker/76498">at the MySQL conference</a>. The MySQL replication and MariaDB teams are also working on the problem. We hope to learn that the InnoDB team is also working on a fix.
<li>InnoDB is much faster when deadlock detection is disabled for workloads with a lot of concurrency and contention.
<li>PBXT continues to be competitive.
<li>PBXT needs to get better at INSERT ON DUPLICATE KEY UPDATE.
<li>MyISAM is faster. This doesn't surprise me. I am surprised that InnoDB is very close to MyISAM on some of the tests.
</ul>
The tests were run for MySQL 5.1.45 with the <a href="http://launchpad.net/mysqlatfacebook/51Facebook">Facebook patch for MySQL 5.1</a> using the PBXT 1.1, InnoDB plugin 1.0.6 and MyISAM engines. Several configurations were run:
<ul>
<li>binlog enabled, innodb deadlock detection disabled/enabled
<li>binlog disabled, innodb deadlock detection disabled/enabled
<li>binlog disabled, innodb deadlock detection disabled/disabled, fsync on commit disabled
</ul>
The Facebook patch for MySQL includes a change that signficantly reduces the overhead from InnoDB deadlock detection <a href="http://bugs.mysql.com/bug.php?id=49047">courtesy of MySQL support</a>. This patch will soon be in an official release. I don't include results without the patch. They are much worse. Even with this patch the overhead from InnoDB deadlock detection can be significant.

With the configuration below the database is cached. The server uses HW RAID with a battery backed write cache so O_DIRECT writes and fsync are fast. The basic my.cnf values are:
<pre>
pbxt_index_cache_size=250M
pbxt_record_cache_size=750M
pbxt_flush_log_at_trx_commit=1
innodb_buffer_pool_size=1000M
innodb_log_file_size=100M
innodb_flush_log_at_trx_commit=1
innodb_doublewrite=0
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=80
innodb_file_per_table
innodb_file_format=barracuda
max_connections=2000
table_cache=2000
key_buffer_size=1000M
</pre>
The value <b>innodb_deadlock_detect=0</b> was added for the tests that disabled InnoDB deadlock detection. For the tests in which fsync on commit was disabled these values were appended to my.cnf: <b>innodb_flush_log_at_trx_commit=2</b> and <b>pbxt_flush_log_at_trx_commit</b>. For tests that used the binlog it was enabled by the following options:
<pre>
log_bin
binlog_format=row
innodb_autoinc_lock_mode=2
sync_binlog=1
</pre>
For all of the results:
<ul>
<li><b>innodb-ndl</b> is the 1.0.6 InnoDB plugin with <b>innodb_deadlock_detect=0</b> used to disable InnoDB deadlock detection. Row lock timeouts would still resolve deadlocks, but only after the timeout expires. This option is in the Facebook patch for MySQL 5.1.
<li>All results are reported for 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 concurrent clients.
<li>The mysqld server and sysbench clients ran on the same 16 core server.
<li>Results are transactions per second (TPS) reported by sysbench where each transaction is one UPDATE or INSERT statement.
</ul>
<h2>Results for concurrent UPDATE</h2>

The statement was <b>UPDATE sbtest set c=c+1 where id=1</b>

TPS with the binlog enabled:
<pre>
innodb-ndl 1263   2076   2373   2556   2448   2537   2584   2483   2350   1949   1495
innodb     1633   2313   2395   2522   2492   2498   2360   1457    576    155     43
pbxt       1626   2113   2214   2250   2217   2149   2008   1906   1689   1368   1088
myisam     2725   3844   3767   3880   4129   3947   4523   3857   3906   3876   4021
</pre>
TPS with the binlog disabled:
<pre>
innodb-ndl 6453  11650  16576  12242  12672  12431  12044  10691   7652   4747   2603
innodb     5046  11760  16710  11775  12299  10348   5898   2173    674    165     50
pbxt       6775  12982  15101  11033  10545   9175   8155   6292   4512   2885   1809
myisam     9186  12931   9638   9319   9720   9515   8851   9529   8680   7827   8691
</pre>
<h2>Results for concurrent INSERT ON DUPLICATE KEY UPDATE</h2>

The statement was <b>INSERT INTO sbtest values(1,0,'0','ayyyyyyyyy') ON DUPLICATE KEY UPDATE c = c + 1</b>. The value 0 is listed for some results from PBXT because of <a href="http://bugs.launchpad.net/pbxt/+bug/551250">bug 551250</a>.

TPS with the binlog enabled:
<pre>
innodb-ndl 1502   2300   2483   2205   2563   2614   2436   2381   2260   1934   1507
innodb     1262   2481   2678   2547   2306   2433   2225   1483    594    163     43
pbxt       1643   2132   1925   1832   1548   1291    895    557      0      0      0
myisam     2573   4145   4088   3704   4043   4442   4611   4694   4280   4265   4037
</pre>
TPS with the binlog disabled:
<pre>
innodb-ndl 2704   4990   9772  13543  12109  11747  11123   9706   7316   4696   2480
innodb     2271   5878   9488  13494  11631   9760   5754   2277    700    172     56
pbxt       2112   4167   2581   1914   1170     46      0      0      0      0      0
myisam     9350  15388  11897  10484  11235  11178  11036  10405  10008   8727  10829
</pre>
TPS with the binlog disabled and fsync on commit disabled:
<pre>
innodb-ndl 6092  11126  14765  11054  11308  10975  10854   9424   7165   4491   2524
innodb     5908  11379  14585  10734  10891   9372   5554   2152    680    170     45
pbxt       5289      0      0      0      0      0   5036   2681   2238    646    153
myisam     9350  15388  11897  10484  11235  11178  11036  10405  10008   8727  10829
</pre><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24088&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24088&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:7;a:6:{s:4:"data";s:68:"
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:67:"More Debate, More Flame, More Choosing the correct tool for the job";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:32:"http://www.bigdbahead.com/?p=714";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:32:"http://www.bigdbahead.com/?p=714";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:5854:"  You have to love all the debating going on over NOSQL -vs- SQL don&#8217;t you?  With my UC session on choosing the right data storage tools ( does this sound better then SQL-vs-NoSQL?) I have been trying to stay current with the mood of the community so i can make my talk more relevant.  Today I was catching up on reading a few blogs posts and I thought I would pass along these two: Pro SQL  and   Pro NoSQL &#8230; these represent the two very different views on this subject. (Note I think there are misleading facts and figures in these that should be flushed out more, but they are a good sample of what I am talking about).    Sure lots of people have posted on this and even talked on it ( I am sure you have all seen Brian&#8217;s NOSQL -vs- MySQL presentation from open sql camp last year).  You see there is a huge angery bitter flame war over who is right and who is wrong. People have very strong opinions on whether SQL or NOSQL is the anti-christ. We should organize a debate at some time.  So who is right?  My opinion is no one is.  
  The fact of is if a solution meets your needs and it works it is not wrong (it may have flaws or risks to different degrees).  In the case of an RDBMS -vs- NOSQL, for some applications one is better then others.  The issue I think we all run into is not really the merit of NOSQL -vs- a traditional RDBMS its the willingness to accept alternative views.  Too many shops out in the world are all about the new hotness and not about what&#8217;s best for their application or organization.  While other people would rather die then allow there database to be taken away from them.  For some apps, durability is not a big deal for others it is. Everyone has different requirements.  Just because Digg or Twitter or Rackspace is doing NOSQL and it works for them does not mean you have to use it, or that it will even work for you.  In fact, if you leap without thinking you may in fact hurt yourself more then solve your problems.  Every situation is unique and before you jump head first into one solution or another take a breath and analyse the situation.  Ask questions like : Why are we thinking about NOSQL? Is just because of HA ( hey RDBMS&#8217;s can handle that! ), is it to replace sharding?  Is it to do something else? …  Ask yourself about the work you need to do: do you need to do complex joins?  How much data will your really have?  What sort of workload do you have?  Really define your goal, then research and test solutions.  I am sure that the big names using Cassandra or Hbase did not read a blog post somewhere and start converting everything that day, and you should not either.

  Also Be careful of all the analysis,  all the opinions, benchmarks, etc you see on the web on the topic.  These are specific to a certain workload or user.  Take Joe&#8217;s post (pro nosql from above), he says “Anyone out there running an EC2 large instance with a RDBMS on it that’s doing 1,800 reads/second? I’ve got a Cassandra node that was getting hammered with a load of 6 serving that much traffic without falling over..”  taken out of context I could say, well hell my laptop this morning got 1200 reads/second on Cassandra and 4,000 reads/second with innodb.  Does that mean MySQL is 4x better then cassandra?  Well in a certain workload, under certain conditions sure&#8230; but I can write another benchmark that shows the opposite.   By the way yes I have gotten well more then 1800 reads/second on an ec2 large instance&#8230;. but the workload is probably so different it&#8217;s a worthless comparison. 
  Facts and figures can be used to sway opinions, especially when variables are unknown.  Let me show you what I mean.  One of my colleagues was getting 55K read/write operations per second on a new server the other day.  Joe  ( Joe I am not picking on you directly, really ) posted he gets 1800/s on a large ec2 server.  That **could** mean that Cassandra would need 31 large ec2 instances to match the power of that one server.  That&#8217;s a cost of  ( $2978.40 per aws large instance) of $92,330 per year.  It&#8217;s over 3x the cost of the particular server that achieved 55K ops.  Who would want to pay 3x more for  the same performance right?  This Proves SQL is awesome and NOSQL Sucks right?  The answer is NO.  Again the workloads are probably so different one may lend itself better to SQL.  What if Joe has 1TB of data and I only had 100G, well that changes the equation and we would have to adjust to account for that.  In this case with 31 servers if I could process 31TB of data at that consistent speed, then it maybe worth it, depending on how long it takes a single RDBMS to deliver results over 31TB.  
  I guess I am trying to say, make a decision based on your own tests and your own workload.  There is nothing wrong with you considering either option as they have their merits and their place in the world:)  There certainly is nothing wrong with listening to all of the banter about our experiences and our opinions.  But even if really smart people tell you all kinds of reasons why NOSQL is better then a RDBMS, or other Equally Smart people tell you why an RDMS is better then a NOSQL Solution, evaluate for yourself and make an informed decision.  A lot of these smart people are looking at the problem from there own unique experience.  If someone had a bad experience with MySQL and did not have a good DBA, they may view MYSQL in a very negative light.  Similarly if you have optimized, developed, and improved MySQL over the years you may view NOSQL solutions as foreign and filled with risk.  Also remember sometimes really smart people sometimes do really dumb things ( I could talk about all the really smart people I know, and the rather non-common sense approaches they have tried because they are so close to a problem).  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 29 Mar 2010 20:17:17 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:6:{i:0;a:5:{s:4:"data";s:4:"Matt";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"NOSQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:9:"benchmark";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:5:"linux";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:11:"performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:6411:"<p>  You have to love all the debating going on over NOSQL -vs- SQL don&#8217;t you?  With my <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12685">UC session</a> on choosing the right data storage tools ( does this sound better then SQL-vs-NoSQL?) I have been trying to stay current with the mood of the community so i can make my talk more relevant.  Today I was catching up on reading a few blogs posts and I thought I would pass along these two:<a href="http://www.yafla.com/dforbes/The_Impact_of_SSDs_on_Database_Performance_and_the_Performance_Paradox_of_Data_Explodification/"> Pro SQL </a> and  <a href="http://stu.mp/2010/03/nosql-vs-rdbms-let-the-flames-begin.html"> Pro NoSQL</a> &#8230; these represent the two very different views on this subject. (Note I think there are misleading facts and figures in these that should be flushed out more, but they are a good sample of what I am talking about).    Sure lots of people have posted on this and even talked on it ( I am sure you have all seen Brian&#8217;s NOSQL -vs- MySQL presentation from open sql camp last year).  You see there is a huge angery bitter flame war over who is right and who is wrong. People have very strong opinions on whether SQL or NOSQL is the anti-christ. We should organize a debate at some time.  So who is right?  My opinion is no one is.  </p>
<p>  The fact of is if a solution meets your needs and it works it is not wrong (it may have flaws or risks to different degrees).  In the case of an RDBMS -vs- NOSQL, for some applications one is better then others.  The issue I think we all run into is not really the merit of NOSQL -vs- a traditional RDBMS its the willingness to accept alternative views.  Too many shops out in the world are all about the new hotness and not about what&#8217;s best for their application or organization.  While other people would rather die then allow there database to be taken away from them.  For some apps, durability is not a big deal for others it is. Everyone has different requirements.  Just because Digg or Twitter or Rackspace is doing NOSQL and it works for them does not mean you have to use it, or that it will even work for you.  In fact, if you leap without thinking you may in fact hurt yourself more then solve your problems.  Every situation is unique and before you jump head first into one solution or another take a breath and analyse the situation.  Ask questions like : Why are we thinking about NOSQL? Is just because of HA ( hey RDBMS&#8217;s can handle that! ), is it to replace sharding?  Is it to do something else? …  Ask yourself about the work you need to do: do you need to do complex joins?  How much data will your really have?  What sort of workload do you have?  Really define your goal, then research and test solutions.  I am sure that the big names using Cassandra or Hbase did not read a blog post somewhere and start converting everything that day, and you should not either.<br />
<span></span><br />
  Also Be careful of all the analysis,  all the opinions, benchmarks, etc you see on the web on the topic.  These are specific to a certain workload or user.  Take Joe&#8217;s post (pro nosql from above), he says “Anyone out there running an EC2 large instance with a RDBMS on it that’s doing 1,800 reads/second? I’ve got a Cassandra node that was getting hammered with a load of 6 serving that much traffic without falling over..”  taken out of context I could say, well hell my laptop this morning got 1200 reads/second on Cassandra and 4,000 reads/second with innodb.  Does that mean MySQL is 4x better then cassandra?  Well in a certain workload, under certain conditions sure&#8230; but I can write another benchmark that shows the opposite.   By the way yes I have gotten well more then 1800 reads/second on an ec2 large instance&#8230;. but the workload is probably so different it&#8217;s a worthless comparison. </p>
<p>  Facts and figures can be used to sway opinions, especially when variables are unknown.  Let me show you what I mean.  One of my colleagues was getting 55K read/write operations per second on a new server the other day.  Joe  ( Joe I am not picking on you directly, really ) posted he gets 1800/s on a large ec2 server.  That **could** mean that Cassandra would need 31 large ec2 instances to match the power of that one server.  That&#8217;s a cost of  ( $2978.40 per aws large instance) of $92,330 per year.  It&#8217;s over 3x the cost of the particular server that achieved 55K ops.  Who would want to pay 3x more for  the same performance right?  This Proves SQL is awesome and NOSQL Sucks right?  The answer is NO.  Again the workloads are probably so different one may lend itself better to SQL.  What if Joe has 1TB of data and I only had 100G, well that changes the equation and we would have to adjust to account for that.  In this case with 31 servers if I could process 31TB of data at that consistent speed, then it maybe worth it, depending on how long it takes a single RDBMS to deliver results over 31TB.  </p>
<p>  I guess I am trying to say, make a decision based on your own tests and your own workload.  There is nothing wrong with you considering either option as they have their merits and their place in the world:)  There certainly is nothing wrong with listening to all of the banter about our experiences and our opinions.  But even if really smart people tell you all kinds of reasons why NOSQL is better then a RDBMS, or other Equally Smart people tell you why an RDMS is better then a NOSQL Solution, evaluate for yourself and make an informed decision.  A lot of these smart people are looking at the problem from there own unique experience.  If someone had a bad experience with MySQL and did not have a good DBA, they may view MYSQL in a very negative light.  Similarly if you have optimized, developed, and improved MySQL over the years you may view NOSQL solutions as foreign and filled with risk.  Also remember sometimes really smart people sometimes do really dumb things ( I could talk about all the really smart people I know, and the rather non-common sense approaches they have tried because they are so close to a problem).  </p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24084&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24084&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:10:"BigDBAHead";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:8;a:6:{s:4:"data";s:73:"
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:57:"New Benchmark I am working on that tests MYSQL -vs- NOSQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:32:"http://www.bigdbahead.com/?p=702";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:32:"http://www.bigdbahead.com/?p=702";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2900:"  I am giving a talk in a couple of weeks at the 2010 MySQL User Conference that will touch on use cases for NOSQL tools -vs- More relational tools, the talk is entitled &#8220;Choosing the Right Tools for the Job, SQL or NOSQL&#8221;.  While this talk is NOT supposed to be a deep dive into the good, bad, and ugly of these solutions, rather a way to discuss potential use cases for various solutions and where they may make a lot of sense, being me I still felt a need to at least do some minor benchmarking of these solutions. The series of posts I wrote last year over on  mysqlperformanceblog.com comparing Tokyo Tyrant to both MySQL and Memcached was fairly popular.  In fact the initial set of benchmark scripts I used for that series actually has been put to good use since then testing out things like a pair gear6 appliances, memcachedb, new memcached versions, and various memcached API&#8217;s.


   When I started really digging into some of the other popular nosql solutions to expand my benchmarks it became apparent that most of these tools have fairly well defined API&#8217;s for Ruby, however in general the API&#8217;s for perl in some cases may not exist at all or are rather immature at this point.  So I decided to rewrite my initial benchmark suite in Perl.  With the help of my co-presenter for this talk ( Yves ) we are writing a tool that will hopefully be able to test the same basic tests against a wide variety of solutions.  Currently I have tests written for Tyrant, Memcached, Cassandra, and MySQL.  We will be expanding these tests to include Redis and MongoDB for sure (Maybe NDB) &#8230; beyond that I am not 100% sure.  The challenge is going to be writing code that not only tests basic features, but also can test the advanced features of these solutions.  After all a simple PK lookup can be done on all of these solutions, but that&#8217;s not necessarily the bread and butter of a solution like MongoDB or even Cassandra.  Its the extra features that make these more compelling.  We will be releasing the code when its ready.


   I have not started my more exhaustive benchmarks yet&#8230; as I am still writing parts of the benchmark, but I have been running a few benchmarks.  I generally hate publishing or mentioning results until I have taken the time to analyse them and ensure I did not miss anything, but what the hell.  In a very short read only test, using PK based lookups to compare Innodb -vs- cassandra -vs- memcached ( a really small data set that should easily fit into memory on both on my laptop **single node **) I end up averaging ~1.2K reads per second from Cassandra, ~ 4K reads per second from Innodb, and ~  17K reads per second in memcached.  Now as I setup more benchmarks I will test multi-node performance, tune the configs for the workload, etc&#8230;  but it is interesting to see the early performance difference.

More later.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 29 Mar 2010 14:57:10 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:7:{i:0;a:5:{s:4:"data";s:4:"Matt";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"NOSQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"Tokyo";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:9:"benchmark";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:5:"linux";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:11:"performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3282:"<p>  I am giving a talk in a couple of weeks at the 2010 MySQL User Conference that will touch on use cases for NOSQL tools -vs- More relational tools, the talk is entitled <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12685">&#8220;Choosing the Right Tools for the Job, SQL or NOSQL&#8221;</a>.  While this talk is NOT supposed to be a deep dive into the good, bad, and ugly of these solutions, rather a way to discuss potential use cases for various solutions and where they may make a lot of sense, being me I still felt a need to at least do some minor benchmarking of these solutions. The series of posts I wrote last year over on  <a href="http://www.mysqlperformanceblog.com/category/nosql/">mysqlperformanceblog.com</a> comparing Tokyo Tyrant to both MySQL and Memcached was fairly popular.  In fact the initial set of benchmark scripts I used for that series actually has been put to good use since then testing out things like a pair gear6 appliances, memcachedb, new memcached versions, and various memcached API&#8217;s.
</p>
<p>
   When I started really digging into some of the other popular nosql solutions to expand my benchmarks it became apparent that most of these tools have fairly well defined API&#8217;s for Ruby, however in general the API&#8217;s for perl in some cases may not exist at all or are rather immature at this point.  So I decided to rewrite my initial benchmark suite in Perl.  With the help of my co-presenter for this talk ( Yves ) we are writing a tool that will hopefully be able to test the same basic tests against a wide variety of solutions.  Currently I have tests written for Tyrant, Memcached, Cassandra, and MySQL.  We will be expanding these tests to include Redis and MongoDB for sure (Maybe NDB) &#8230; beyond that I am not 100% sure.  The challenge is going to be writing code that not only tests basic features, but also can test the advanced features of these solutions.  After all a simple PK lookup can be done on all of these solutions, but that&#8217;s not necessarily the bread and butter of a solution like MongoDB or even Cassandra.  Its the extra features that make these more compelling.  We will be releasing the code when its ready.
</p>
<p>
   I have not started my more exhaustive benchmarks yet&#8230; as I am still writing parts of the benchmark, but I have been running a few benchmarks.  I generally hate publishing or mentioning results until I have taken the time to analyse them and ensure I did not miss anything, but what the hell.  In a very short read only test, using PK based lookups to compare Innodb -vs- cassandra -vs- memcached ( a really small data set that should easily fit into memory on both on my laptop **single node **) I end up averaging ~1.2K reads per second from Cassandra, ~ 4K reads per second from Innodb, and ~  17K reads per second in memcached.  Now as I setup more benchmarks I will test multi-node performance, tune the configs for the workload, etc&#8230;  but it is interesting to see the early performance difference.
</p>
<p>More later.</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24083&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24083&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:10:"BigDBAHead";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:9;a:6:{s:4:"data";s:53:"
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:41:"MySQL related files and basic information";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:40:"http://kedar.nitty-witty.com/blog/?p=726";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:76:"http://kedar.nitty-witty.com/blog/mysql-related-files-and-basic-information/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:493:"This post covers the basic information of files that MySQL Server uses / creates for various tasks.
my.cnf :
It is the main configuration file for MySQL. You may find it under base directory in windows or under /etc/.
.sock  :
mysqld creates a socket for programs to connect to and notes in this file. It is named [...]


Related posts:Load Delimited Data (csv, excel) into MySQL Server
Calculate Mysql Memory Usage &#8211; Quick Stored Proc
Difference MyISAM and InnoDB Storage Engines Mysql
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 29 Mar 2010 12:43:14 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:9:"technical";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"files";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1304:"This post covers the basic information of files that MySQL Server uses / creates for various tasks.
my.cnf :
It is the main configuration file for MySQL. You may find it under base directory in windows or under /etc/.
.sock  :
mysqld creates a socket for programs to connect to and notes in this file. It is named [...]


Related posts:<ol><li><a href="http://kedar.nitty-witty.com/blog/load-delimited-data-csv-excel-into-mysql-server/" rel="bookmark" title="Permanent Link: Load Delimited Data (csv, excel) into MySQL Server">Load Delimited Data (csv, excel) into MySQL Server</a></li>
<li><a href="http://kedar.nitty-witty.com/blog/calculte-mysql-memory-usage-quick-stored-proc/" rel="bookmark" title="Permanent Link: Calculate Mysql Memory Usage – Quick Stored Proc">Calculate Mysql Memory Usage &#8211; Quick Stored Proc</a></li>
<li><a href="http://kedar.nitty-witty.com/blog/difference-myisam-and-innodb-storage-engines-mysql/" rel="bookmark" title="Permanent Link: Difference MyISAM and InnoDB Storage Engines Mysql">Difference MyISAM and InnoDB Storage Engines Mysql</a></li>
</ol><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24072&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24072&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:5:"Kedar";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:10;a:6:{s:4:"data";s:38:"
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:56:"OK, you have waited long enough, here's my take on NoSQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-9144505959002328789.post-7914950078176243991";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:92:"http://karlssonondatabases.blogspot.com/2010/03/ok-you-have-waited-long-enough-heres-my.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:4394:"I have to admit I'm getting old, and I am now scaringly close to being a sad old git, and not only that, I'm a database guy also, and I have worked with SQL based relational databases for more than 20 years now.So considering my age and heritage, I really should just dispose the NoSQL movement as something for those young kids with earrings and a baseball cap (and to add insult to injury, the cap is worn backwards) and that any serious database dude like myself, with my loads of experience (like the invaluable experience of having run Oracle on a Wang word-processor. Real valuable stuff I tell you). But no, you will not hear me do that. But also, you will not hear me say that NoSQL key-value stores will replace all SQL databases within 5 years (If I worked for an analyst and was paid dearly to say things like that, I might have, though. Any takers?).My take is actually quite simple. The strength of the relational model is that it is inredibly generic. The lack of a specific order and hierarchy makes it even more generic. I think few people would argue that more or less all applications served by NoSQL could just as well be served by a SQL database, if it wasn't for two things:Scalability - The lack of the strict consistency rules of RDBMS heritage in most NoSQL implementations makes them much more scalable. The very nature of most NoSQL stores is distributed, and the lack of strict distributed consistency makes this distribution scalable beyond what is usually possible with an RDBMS, given the same platform etc.Performance - This is largely due to the above, i.e. a NoSQL store being more scalable makes it easier to cram more performance out of it.Now, with all this in mind, am I saying that NoSQL has all the advantages of an RDBMS, but with much better scalability? Nope, no way, José.The strict consistency requirements of an RDBMS is also an advantage. It's not so that, if I understand them correctly, the propoents for NoSQL stores thinks that consistency is bad, it's just that they don't want to pay the price in terms of performance for it. And to be frank, although in many cases data inconsistency is acceptable, it still has to be controlled, an uncontrolled consistency, i.e. you don't know how inconsistent your store is and in what way or anything, is not something we want. So even a NoSQL store is limited.So it all comes down to performance then. We sacrifice consistency to gain performance through scalability. Right? If you agree to that, then I think NoSQL is not a long term solution. It's not that I am saying that "NoSQL is for kids, real databases needs SQL", that was the argument against SQL based databases in the 1980's largely, where Hierarchical databases still ruled, and SQL just had a too big overhead, or so it was thought. The differerence here is that SQL had higher functionality than the competing technologies of the 1980's, but not enough performance in many cases. But performance is bound to go up. All the time. And for much less money. At least for a while to come. Look at virtualization. I've been a proponent for that for quite a while, and just a few years back, the argument against it was that "performance sux". Well, compared to raw iron maybe it did, but that wasn't the point. The point was, did I get enough performance? And in many cases you did, with an environment that was a lot easier to manage and at a low cost.What this means to me is that there is a place for NoSQL stores right now, where the performance and size requiements are really high, and where one is willing to compromize consistency. But a technology that limits functionality, features and ease-of-use at the price of performance will continue to be a niche technology. But that doesn't mean it's useless or anything, quite the opposite, I'm a pragmatist at heart, and whatever works, works. But if I had the choise of storing my data in consistent or in-consistent state, and if both solutions provided enough performance for my needs, I'd go consistent any time.And then there is one more thing. The scalability of the NoSQL stores is largely due to it's distributed nature. And there are arguments out there that says that you cannot create a consistent, distributed, scalable datastore. I think you can, I'm convinced of it actually. There may be other compromises needed to achieve that, but that it can be done I am sure./Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 29 Mar 2010 09:39:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4923:"I have to admit I'm getting old, and I am now scaringly close to being a <span>sad old git</span><span>, and not only that, I'm a database guy also, and I have worked with SQL based relational databases for more than 20 years now.<br /><br />So considering my age and heritage, I really should just dispose the NoSQL movement as something for those young kids with earrings and a baseball cap (and to add insult to injury, the cap is worn backwards) and that any serious database dude like myself, with my loads of experience (like the invaluable experience of having run Oracle on a Wang word-processor. Real valuable stuff I tell you). But no, you will not hear me do that. But also, you will not hear me say that NoSQL key-value stores will replace all SQL databases within 5 years (If I worked for an analyst and was paid dearly to say things like that, I might have, though. Any takers?).<br /><br />My take is actually quite simple. The strength of the relational model is that it is inredibly generic. The lack of a specific order and hierarchy makes it even more generic. I think few people would argue that more or less all applications served by NoSQL could just as well be served by a SQL database, if it wasn't for two things:<br /></span><ul><li>Scalability - The lack of the strict consistency rules of RDBMS heritage in most NoSQL implementations makes them much more scalable. The very nature of most NoSQL stores is distributed, and the lack of strict distributed consistency makes this distribution scalable beyond what is usually possible with an RDBMS, given the same platform etc.</li><li>Performance - This is largely due to the above, i.e. a NoSQL store being more scalable makes it easier to cram more performance out of it.</li></ul>Now, with all this in mind, am I saying that NoSQL has all the advantages of an RDBMS, but with much better scalability? Nope, no way, José.<br /><br />The strict consistency requirements of an RDBMS is also an advantage. It's not so that, if I understand them correctly, the propoents for NoSQL stores thinks that consistency is bad, it's just that they don't want to pay the price in terms of performance for it. And to be frank, although in many cases data inconsistency is acceptable, it still has to be controlled, an uncontrolled consistency, i.e. you don't know how inconsistent your store is and in what way or anything, is not something we want. So even a NoSQL store is limited.<br /><br />So it all comes down to performance then. We sacrifice consistency to gain performance through scalability. Right? If you agree to that, then I think NoSQL is not a long term solution. It's not that I am saying that "NoSQL is for kids, real databases needs SQL", that was the argument against SQL based databases in the 1980's largely, where Hierarchical databases still ruled, and SQL just had a too big overhead, or so it was thought. The differerence here is that SQL had higher functionality than the competing technologies of the 1980's, but not enough performance in many cases. But performance is bound to go up. All the time. And for much less money. At least for a while to come. Look at virtualization. I've been a proponent for that for quite a while, and just a few years back, the argument against it was that "performance sux". Well, compared to raw iron maybe it did, but that wasn't the point. The point was, did I get enough performance? And in many cases you did, with an environment that was a lot easier to manage and at a low cost.<br /><br />What this means to me is that there is a place for NoSQL stores right now, where the performance and size requiements are really high, and where one is willing to compromize consistency. But a technology that limits functionality, features and ease-of-use at the price of performance will continue to be a niche technology. But that doesn't mean it's useless or anything, quite the opposite, I'm a pragmatist at heart, and whatever works, works. But if I had the choise of storing my data in consistent or in-consistent state, and if both solutions provided enough performance for my needs, I'd go consistent any time.<br /><br />And then there is one more thing. The scalability of the NoSQL stores is largely due to it's distributed nature. And there are arguments out there that says that you cannot create a consistent, distributed, scalable datastore. I think you can, I'm convinced of it actually. There may be other compromises needed to achieve that, but that it can be done I am sure.<br /><br />/Karlsson<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/9144505959002328789-7914950078176243991?l=karlssonondatabases.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24071&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24071&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Anders Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:11;a:6:{s:4:"data";s:58:"
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:50:"MariaDB talk at the OpenSourceDays 2010 conference";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:49:"http://kristiannielsen.livejournal.com/11602.html";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:49:"http://kristiannielsen.livejournal.com/11602.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1935:"
Earlier this month, I was at
the OpenSourceDays 2010
conference, giving a talk on MariaDB
(the slides from the talk are
available).



The talk went quite well I think (though I probably talked way too fast as I usually
do; at least that means that I finished on time with plenty room for
questions..)



There was quite a bit of interest after the talk from many of the
people who heard it.
It was even
reported on
  by the Danish IT media version2.dk (article in Danish).



Especially interesting to me was to discuss with three people
from Danish site komogvind.dk, who told
me fascinating details about their work keeping a busy site running; one of
them even went right home to benchmark against MariaDB. Thanks to you, and to
everyone else for your interest and time!



This time in the talk, I tried to also focus on the community and development
aspects of MariaDB (in addition to the mandatory feature list and benchmark
graphs, of course). To me, the most important thing about MariaDB is that we
now have the infrastructure and community for people outside of MySQL to do fullscale
development at the same level as inside MySQL. This was missing before. It is
a much less concrete thing than features and benchmarks, so I found it much
harder to present in a good way, without it turning into nothing but buzzwords. But from the
feedback I got afterwards, it seems I succeeded pretty well with this part
also, which I am especially happy about!



The talk was recorded on video by the organisers. The latest I heard was that
the video footage is still being edited though (I was kind of waiting, hoping
to be able to include a link to the video in this post). But if they do manage
to finish the editing and make the videos available later, I will post an update.



A big thanks to the organisers of the OpenSourceDays 2010 conference! I had
a great time, and hope to be back again next spring for OpenSourceDays 2011.
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 29 Mar 2010 09:03:56 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:10:"conference";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:12:"freesoftware";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:7:"mariadb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2450:"<p>
Earlier this month, I was at
the <a href="http://www.opensourcedays.org/2010/">OpenSourceDays 2010</a>
conference, giving a talk on MariaDB
(the <a href="http://askmonty.org/w/images/a/a4/Osd2010.pdf">slides from the talk</a> are
available).
</p>

<p>
The talk went quite well I think (though I probably talked way too fast as I usually
do; at least that means that I finished on time with plenty room for
questions..)
</p>

<p>
There was quite a bit of interest after the talk from many of the
people who heard it.
It was even
<a href="http://www.version2.dk/artikel/14131-mysqls-lillesoester-loesner-grebet-fra-sun-og-oracle">reported on
  by the Danish IT media version2.dk</a> (article in Danish).
</p>

<p>
Especially interesting to me was to discuss with three people
from Danish site <a href="http://www.komogvind.dk/">komogvind.dk</a>, who told
me fascinating details about their work keeping a busy site running; one of
them even went right home to benchmark against MariaDB. Thanks to you, and to
everyone else for your interest and time!
</p>

<p>
This time in the talk, I tried to also focus on the community and development
aspects of MariaDB (in addition to the mandatory feature list and benchmark
graphs, of course). To me, the most important thing about MariaDB is that we
now have the infrastructure and community for people outside of MySQL to do fullscale
development at the same level as inside MySQL. This was missing before. It is
a much less concrete thing than features and benchmarks, so I found it much
harder to present in a good way, without it turning into nothing but buzzwords. But from the
feedback I got afterwards, it seems I succeeded pretty well with this part
also, which I am especially happy about!
</p>

<p>
The talk was recorded on video by the organisers. The latest I heard was that
the video footage is still being edited though (I was kind of waiting, hoping
to be able to include a link to the video in this post). But if they do manage
to finish the editing and make the videos available later, I will post an update.
</p>

<p>
A big thanks to the organisers of the OpenSourceDays 2010 conference! I had
a great time, and hope to be back again next spring for OpenSourceDays 2011.
</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24067&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24067&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:16:"Kristian Nielsen";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:12;a:6:{s:4:"data";s:63:"
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:55:"New Tungsten Software Releases for MySQL and PostgreSQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:69:"tag:blogger.com,1999:blog-768233104244702633.post-3570478421140412140";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:82:"http://scale-out-blog.blogspot.com/2010/03/new-tungsten-software-releases-for.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2992:"I would like to announce a couple of new Tungsten versions available for your database clustering enjoyment.&nbsp; As most readers of this blog are aware, Tungsten allows users to create highly available data services that include replicated copies, distributed management, and application connectivity using unaltered open source databases.&nbsp;&nbsp; We are continually improving the software and have a raft of new features coming out this year.&nbsp;&nbsp; First, there is a new Tungsten 1.2.3 maintenance release available in both commercial as well as open source editions.&nbsp; You can get access to the commercial version on the Continuent website, while the open source version is available on SourceForge.&nbsp; &nbsp;The Tungsten 1.2.3 release focuses on improvements for MySQL users including the following: Transparent session consistency for multi-tenant applications.&nbsp; This allows applications that follow some simple conventions like sharding tenant data by database to get automatic read scaling to slaves without making code changes. A greatly improved script for purging history on Tungsten Replicator.&nbsp; Fixes to binlog extraction to handle enum and set data types correctly.&nbsp; By far the biggest improvement in this release is Tungsten product documentation, including major rewrites for the guides covering management and connectivity.&nbsp; Even the Release Notes are better.&nbsp; If you want to find out how Tungsten works, start with the new Tungsten Concepts and Administration Guide.&nbsp; Second, there's a new Tungsten 1.3 release coming out soon.&nbsp; Commercial versions are already in use at selected customer sites, and you can build the open source version by downloading code from SVN on SourceForge.&nbsp; The Tungsten 1.3 release sports major feature additions in the following areas:&nbsp; A new replicator architecture that allows you to manage non-Tungsten replication and also to configure very flexible replication flows to use multi-core systems more effectively and implement complex replication topologies.&nbsp; The core processing loop for replication can now cycle through 700,000 events per second on my laptop--it's really quick.&nbsp; Much improved support for PostgreSQL warm standby clustering as well as provisional management of new PostgreSQL 9 features like streaming replication and hot standby.&nbsp;&nbsp;Replication support for just about everything in the MySQL binlog:&nbsp; large transactions, unsigned characters, session variables, various permutations of character sets and binary data, and ability to download binlog files through the MySQL client protocol.&nbsp; If you can put it in the binlog we can replicate it.&nbsp;&nbsp;We also have provisional support for Drizzle thanks to Markus Ericsson, plus a raft of other improvements.&nbsp; This has been a huge amount of work all around, so I hope you'll enjoy the results.P.s., Contact Continuent if you want to be a beta test site for Tungsten 1.3.&nbsp;";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 29 Mar 2010 05:24:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:11:"Replication";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:4:"SaaS";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:10:"PostgreSQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:7:"Drizzle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4042:"I would like to announce a couple of new Tungsten versions available for your database clustering enjoyment.&nbsp; As most readers of this blog are aware, Tungsten allows users to create highly available data services that include replicated copies, distributed management, and application connectivity using unaltered open source databases.&nbsp;&nbsp; We are continually improving the software and have a raft of new features coming out this year.&nbsp;&nbsp; <br /><br />First, there is a new Tungsten 1.2.3 maintenance release available in both commercial as well as open source editions.&nbsp; You can get access to the<a href="http://www.continuent.com/downloads"> commercial version on the Continuent website</a>, while the <a href="https://sourceforge.net/projects/tungsten/">open source version is available on SourceForge</a>.&nbsp; <br /><br />&nbsp;The Tungsten 1.2.3 release focuses on improvements for MySQL users including the following: <br /><ul><li>Transparent session consistency for multi-tenant applications.&nbsp; This allows applications that follow some simple conventions like sharding tenant data by database to get automatic read scaling to slaves without making code changes. </li><li>A greatly improved script for purging history on Tungsten Replicator.&nbsp; </li><li>Fixes to binlog extraction to handle enum and set data types correctly.&nbsp; </li></ul>By far the biggest improvement in this release is <a href="http://www.continuent.com/downloads/documentation">Tungsten product documentation</a>, including major rewrites for the guides covering management and connectivity.&nbsp; Even the <a href="http://www.continuent.com/images/stories/pdfs/tungsten-release-notes.pdf">Release Notes</a> are better.&nbsp; If you want to find out how Tungsten works, start with the new <a href="http://www.continuent.com/images/stories/pdfs/tungsten-concepts-and-administration-guide.pdf">Tungsten Concepts and Administration Guide</a>.&nbsp; <br /><br />Second, there's a new Tungsten 1.3 release coming out soon.&nbsp; Commercial versions are already in use at selected customer sites, and you can build the open source version by <a href="http://sourceforge.net/projects/tungsten/develop">downloading code from SVN on SourceForge</a>.&nbsp; <br /><br />The Tungsten 1.3 release sports major feature additions in the following areas:&nbsp; <br /><ul><li>A new replicator architecture that allows you to manage non-Tungsten replication and also to configure very flexible replication flows to use multi-core systems more effectively and implement complex replication topologies.&nbsp; The core processing loop for replication can now cycle through 700,000 events per second on my laptop--it's really quick.&nbsp; </li><li>Much improved support for PostgreSQL warm standby clustering as well as provisional management of new PostgreSQL 9 features like streaming replication and hot standby.&nbsp;&nbsp;</li><li>Replication support for just about everything in the MySQL binlog:&nbsp; large transactions, unsigned characters, session variables, various permutations of character sets and binary data, and ability to download binlog files through the MySQL client protocol.&nbsp; If you can put it in the binlog we can replicate it.&nbsp;&nbsp;</li></ul>We also have provisional support for Drizzle thanks to <a href="http://developian.blogspot.com/2009/10/replication-from-mysql-to-drizzle-using.html">Markus Ericsson</a>, plus a raft of other improvements.&nbsp; This has been a huge amount of work all around, so I hope you'll enjoy the results.<br /><br />P.s., Contact Continuent if you want to be a beta test site for Tungsten 1.3.&nbsp;<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/768233104244702633-3570478421140412140?l=scale-out-blog.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24066&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24066&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Robert Hodges";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:13;a:6:{s:4:"data";s:88:"
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:43:"The actual range and storage size of an INT";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:33:"http://openquery.com/blog/?p=1205";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:55:"http://openquery.com/blog/actual-range-storage-size-int";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2716:"What&#8217;s the difference between INT(2) and INT(20) ? Not a lot. It&#8217;s about output formatting, which you&#8217;ll never encounter when talking with the server through an API (like you do from most app languages).
The confusion stems from the fact that with CHAR(n) and VARCHAR(n), the (n) signifies the length or maximum length of that field. But for INT, the range and storage size is specified using different data types: TINYINT, SMALLINT, MEDIUMINT, INT (aka INTEGER), BIGINT.
At Open Query we tend to pick on things like INT(2) when reviewing a client&#8217;s schema, because chances are that the developers/DBAs are working under a mistaken assumption and this could cause trouble somewhere &#8211; even if not in the exact spot where we pick on it. So it&#8217;s a case of pattern recognition.
A very practical example of this comes from a client I worked with last week. I first spotted some harmless ones, we talked about it, and then we hit the jackpot: INT(22) or something, which in fact was storing a unix timestamp converted to int by the application, for the purpose of, wait for this, user&#8217;s birth date. There&#8217;s a number of things wrong with this, and the result is something that doesn&#8217;t work properly.
Currently, the unix epoc/timestamp when stored in binary is a 32 bit unsigned integer, with a range from 1970-01-01 to somewhere in 2037. Note the unsigned qualifier, otherwise it already wraps around 2004.

 if using signed, you&#8217;d currently only find out with users younger than 7 or so. You may be &#8220;lucky&#8221; to not have any, but kids are tech savvy so websites and systems in general may well have entries with kids younger than that.


 using a timestamp for date-of-birth tells me that the developers are young   well that&#8217;s relative, but in this: younger than 40. I was born in 1969, so I am very aware that it&#8217;s impossible to represent my birthdate in a unix timestamp! What dates do you test with? Your own, and people around you. &#8216;nuf said.
finally, INT(22) is still an INT, which for MySQL means 32 bits (4 bytes) and it happened to be signed also.

So, all in all, this wasn&#8217;t going to work. Exactly what would fail where would be highly app code (and date) dependent, but you can tell it needs a quick redesign anyway.
I actually suggested checking the requirements whether having just a year would suffice for the intended use (can be stored in a YEAR(4) field), this reduces the amount of personal data stored and thus removes privacy concerns. Otherwise, a DATE field which can optionally be allowed to not have a day-of-month (i.e. only ask for year/month) as that again can be sufficient for the intended purpose.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Mon, 29 Mar 2010 02:45:03 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:10:{i:0;a:5:{s:4:"data";s:28:"Good practice / Bad practice";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:9:"birthdate";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:10:"data types";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:8:"database";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:13:"date-of-birth";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:3:"int";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:7:"integer";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:7:"mariadb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:8;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:9;a:5:{s:4:"data";s:4:"year";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3131:"<p>What&#8217;s the difference between INT(2) and INT(20) ? Not a lot. It&#8217;s about output formatting, which you&#8217;ll never encounter when talking with the server through an API (like you do from most app languages).</p>
<p>The confusion stems from the fact that with CHAR(n) and VARCHAR(n), the (n) signifies the length or maximum length of that field. But for INT, the range and storage size is specified using different data types: TINYINT, SMALLINT, MEDIUMINT, INT (aka INTEGER), BIGINT.</p>
<p>At Open Query we tend to pick on things like INT(2) when reviewing a client&#8217;s schema, because chances are that the developers/DBAs are working under a mistaken assumption and this could cause trouble somewhere &#8211; even if not in the exact spot where we pick on it. So it&#8217;s a case of pattern recognition.</p>
<p>A very practical example of this comes from a client I worked with last week. I first spotted some harmless ones, we talked about it, and then we hit the jackpot: INT(22) or something, which in fact was storing a unix timestamp converted to int by the application, for the purpose of, wait for this, user&#8217;s birth date. There&#8217;s a number of things wrong with this, and the result is something that doesn&#8217;t work properly.</p>
<p>Currently, the unix epoc/timestamp when stored in binary is a 32 bit unsigned integer, with a range from 1970-01-01 to somewhere in 2037. Note the unsigned qualifier, otherwise it already wraps around 2004.</p>
<ul>
<li> if using signed, you&#8217;d currently only find out with users younger than 7 or so. You may be &#8220;lucky&#8221; to not have any, but kids are tech savvy so websites and systems in general may well have entries with kids younger than that.</li>
</ul>
<ul>
<li> using a timestamp for date-of-birth tells me that the developers are young <img src="http://openquery.com/blog/wp-includes/images/smilies/icon_wink.gif" alt=";-)" class="wp-smiley" />  well that&#8217;s relative, but in this: younger than 40. I was born in 1969, so I am very aware that it&#8217;s impossible to represent my birthdate in a unix timestamp! What dates do you test with? Your own, and people around you. &#8216;nuf said.</li>
<li>finally, INT(22) is still an INT, which for MySQL means 32 bits (4 bytes) and it happened to be signed also.</li>
</ul>
<p>So, all in all, this wasn&#8217;t going to work. Exactly what would fail where would be highly app code (and date) dependent, but you can tell it needs a quick redesign anyway.</p>
<p>I actually suggested checking the requirements whether having just a year would suffice for the intended use (can be stored in a YEAR(4) field), this reduces the amount of personal data stored and thus removes privacy concerns. Otherwise, a DATE field which can optionally be allowed to not have a day-of-month (i.e. only ask for year/month) as that again can be sufficient for the intended purpose.</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24065&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24065&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:10:"Open Query";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:14;a:6:{s:4:"data";s:98:"
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:37:"Dell MD1120 Storage Array Performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:69:"http://venublog.com/2010/03/28/dell-md1120-storage-array-performance/";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:69:"http://venublog.com/2010/03/28/dell-md1120-storage-array-performance/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3483:"Here is some file IO performance numbers from DELL MD1120 SAS storage array. Last year I did the same test with HP P800 storage array and numbers were impressive. But when it comes to this high end storage array, few surprises.&#160; Before getting into actual details; lets see the test stats and configuration details.
System Configuration:

DELL R710 with CentOS 5.4 
NOOP IO Scheduler 
MD1120 with 22 10K SAS disks

20 disk RAID-10 (hardware) 
2 hot spares 
Disk Cache disabled 


PERC 6/E RAID controller with BBU

Connected to DELL MD1120 using SAS 
Write Back 
Read Cache Disabled 



Test Configuration:

Sysbench fileio test with variable modes and threads 
64 files with 50G total size 
All tests ran in un-buffered mode (O_DIRECT) as most of the workload is InnoDB based. 

Test Results:
Number of Threads vs Number of Requests/Sec. Every mode ran with 5 iterations and average is taken.
Random IO:
 
Sequential IO:
&#160; 
HDPARM Test:

&#91;test~&#93;# for i in `seq 1 3`; do hdparm --direct -tT /dev/sdc1; done | grep Timing
 Timing O_DIRECT cached reads:   2068 MB in  2.00 seconds = 1033.21 MB/sec
 Timing O_DIRECT disk reads:  2146 MB in  3.00 seconds = 715.32 MB/sec
 Timing O_DIRECT cached reads:   2020 MB in  2.00 seconds = 1010.26 MB/sec
 Timing O_DIRECT disk reads:  2162 MB in  3.00 seconds = 720.62 MB/sec
 Timing O_DIRECT cached reads:   2052 MB in  2.00 seconds = 1025.90 MB/sec
 Timing O_DIRECT disk reads:  2128 MB in  3.00 seconds = 709.17 MB/sec
&nbsp;
&#91;test ~&#93;# for i in `seq 1 3`; do hdparm -tT /dev/sdc1; done | grep Timing
 Timing cached reads:   18920 MB in  2.00 seconds = 9475.34 MB/sec
 Timing buffered disk reads:  3442 MB in  3.02 seconds = 1141.44 MB/sec
 Timing cached reads:   19332 MB in  2.00 seconds = 9681.56 MB/sec
 Timing buffered disk reads:  3478 MB in  3.00 seconds = 1159.24 MB/sec
 Timing cached reads:   18012 MB in  2.00 seconds = 9019.50 MB/sec
 Timing buffered disk reads:  3492 MB in  3.02 seconds = 1155.53 MB/sec

Analysis:

Overall the numbers are not bad when it comes to writes, but few surprises when it comes to reads. When compared with HP&#8217;s P800 storage array, the numbers still dropped by 20%. 
Radon IO:

Random write requests ranges from 3200-5000 per sec; due to write back mode (512M cache) 
Writes are linearly scaling well with the threads, good sign that controller is able to manage the cache efficiently 
Random reads and writes (rndrw) is also scaling linearly with the threads load, means the IO distribution and cache burst to satisfy reads seems be efficient as it needs to flush the data from controller cache to disk before the read can be satisfied. 


Sequential IO:

Writes seems to be scaling well even in sequential mode without much overhead 
When it comes to reads, big surprise is drop from 5626 requests/sec to 615 from one thread to two threads. Which is really odd. Worst case it should be ~2000-3000 requests/sec; not sure where the overhead is. I can&#8217;t believe it could be thread scheduling as there is only 2 threads. 


During 100% IO, on and off I noticed IO serialization with higher queue waits, which indicates that there is some degree of serialization overhead in OS; but not able to track which layer is triggering this. Tried with cfq/deadline, still the same. 
Next attempt will be replacing 3Gb/s SAS to fiber channel HBA or 6Gb/s SAS (PERC H800) to see how it performs along with combination of HW and SW raid instead of only depending on controller. 
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sun, 28 Mar 2010 23:48:54 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:12:{i:0;a:5:{s:4:"data";s:8:"Database";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:8:"Hardware";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:11:"Scalability";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:16:"Dell R710 MD1120";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:16:"IO serialization";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:18:"MD1120 performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:19:"MySQL Dell Hardware";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:8;a:5:{s:4:"data";s:29:"MySQL Dell MD1120 performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:9;a:5:{s:4:"data";s:24:"MySQL MD1120 performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:10;a:5:{s:4:"data";s:17:"mysql performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:11;a:5:{s:4:"data";s:25:"RAID 10 MySQL performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:6331:"<p>Here is some file IO performance numbers from <a href="http://www.dell.com/us/en/business/storage/storage_powervault_md1120/pd.aspx?refid=storage_powervault_md1120&amp;cs=04&amp;s=bsd">DELL MD1120</a> SAS storage array. Last year I did the same test with <a href="http://venublog.com/2009/04/04/hp-p800-smart-array-performance/">HP P800 storage array</a> and numbers were impressive. But when it comes to this high end storage array, few surprises.&#160; Before getting into actual details; lets see the test stats and configuration details.</p>
<h4>System Configuration:</h4>
<ol>
<li><a href="http://www.dell.com/us/en/enterprise/servers/server-poweredge-r710/pd.aspx?refid=server-poweredge-r710&amp;cs=555&amp;s=biz">DELL R710</a> with CentOS 5.4 </li>
<li>NOOP IO Scheduler </li>
<li>MD1120 with 22 10K SAS disks
<ul>
<li>20 disk RAID-10 (hardware) </li>
<li>2 hot spares </li>
<li>Disk Cache disabled </li>
</ul>
</li>
<li><a href="http://www.dell.com/content/topics/topic.aspx/global/products/pvaul/topics/en/us/raid_controller?c=us&amp;l=en&amp;cs=555">PERC 6/E RAID controller</a> with BBU
<ul>
<li>Connected to DELL MD1120 using SAS </li>
<li>Write Back </li>
<li>Read Cache Disabled </li>
</ul>
</li>
</ol>
<h4>Test Configuration:</h4>
<ol>
<li>Sysbench fileio test with variable modes and threads </li>
<li>64 files with 50G total size </li>
<li>All tests ran in un-buffered mode (O_DIRECT) as most of the workload is InnoDB based. </li>
</ol>
<h4>Test Results:</h4>
<p>Number of Threads vs Number of Requests/Sec. Every mode ran with 5 iterations and average is taken.</p>
<p><strong>Random IO:</strong></p>
<p><a href="http://venublog.com/images/DellMD1120Performance_A3C6/rndio.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; margin-left: 0px; border-left-width: 0px; margin-right: 0px" title="rndio" border="0" alt="rndio" src="http://venublog.com/images/DellMD1120Performance_A3C6/rndio_thumb.png" width="501" height="561" /></a> </p>
<p><strong>Sequential IO:</strong></p>
<p><strong></strong><a href="http://venublog.com/images/DellMD1120Performance_A3C6/seqio.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="seqio" border="0" alt="seqio" src="http://venublog.com/images/DellMD1120Performance_A3C6/seqio_thumb.png" width="509" height="554" /></a>&#160; </p>
<p><strong>HDPARM Test:</strong></p>

<div><table><tr><td><pre><span>&#91;</span><span>test</span>~<span>&#93;</span><span># for i in `seq 1 3`; do hdparm --direct -tT /dev/sdc1; done | grep Timing</span>
 Timing O_DIRECT cached reads:   <span>2068</span> MB <span>in</span>  <span>2.00</span> seconds = <span>1033.21</span> MB<span>/</span>sec
 Timing O_DIRECT disk reads:  <span>2146</span> MB <span>in</span>  <span>3.00</span> seconds = <span>715.32</span> MB<span>/</span>sec
 Timing O_DIRECT cached reads:   <span>2020</span> MB <span>in</span>  <span>2.00</span> seconds = <span>1010.26</span> MB<span>/</span>sec
 Timing O_DIRECT disk reads:  <span>2162</span> MB <span>in</span>  <span>3.00</span> seconds = <span>720.62</span> MB<span>/</span>sec
 Timing O_DIRECT cached reads:   <span>2052</span> MB <span>in</span>  <span>2.00</span> seconds = <span>1025.90</span> MB<span>/</span>sec
 Timing O_DIRECT disk reads:  <span>2128</span> MB <span>in</span>  <span>3.00</span> seconds = <span>709.17</span> MB<span>/</span>sec
&nbsp;
<span>&#91;</span><span>test</span> ~<span>&#93;</span><span># for i in `seq 1 3`; do hdparm -tT /dev/sdc1; done | grep Timing</span>
 Timing cached reads:   <span>18920</span> MB <span>in</span>  <span>2.00</span> seconds = <span>9475.34</span> MB<span>/</span>sec
 Timing buffered disk reads:  <span>3442</span> MB <span>in</span>  <span>3.02</span> seconds = <span>1141.44</span> MB<span>/</span>sec
 Timing cached reads:   <span>19332</span> MB <span>in</span>  <span>2.00</span> seconds = <span>9681.56</span> MB<span>/</span>sec
 Timing buffered disk reads:  <span>3478</span> MB <span>in</span>  <span>3.00</span> seconds = <span>1159.24</span> MB<span>/</span>sec
 Timing cached reads:   <span>18012</span> MB <span>in</span>  <span>2.00</span> seconds = <span>9019.50</span> MB<span>/</span>sec
 Timing buffered disk reads:  <span>3492</span> MB <span>in</span>  <span>3.02</span> seconds = <span>1155.53</span> MB<span>/</span>sec</pre></td></tr></table></div>

<h4><strong>Analysis:</strong></h4>
<ol>
<li>Overall the numbers are not bad when it comes to writes, but few surprises when it comes to reads. When compared with HP&#8217;s P800 storage array, the numbers still dropped by 20%. </li>
<li>Radon IO:
<ul>
<li>Random write requests ranges from 3200-5000 per sec; due to write back mode (512M cache) </li>
<li>Writes are linearly scaling well with the threads, good sign that controller is able to manage the cache efficiently </li>
<li>Random reads and writes (rndrw) is also scaling linearly with the threads load, means the IO distribution and cache burst to satisfy reads seems be efficient as it needs to flush the data from controller cache to disk before the read can be satisfied. </li>
</ul>
</li>
<li>Sequential IO:
<ul>
<li>Writes seems to be scaling well even in sequential mode without much overhead </li>
<li>When it comes to reads, big surprise is <strong>drop from 5626 requests/sec to 615 from one thread to two threads</strong>. Which is really odd. Worst case it should be ~2000-3000 requests/sec; not sure where the overhead is. I can&#8217;t believe it could be thread scheduling as there is only 2 threads. </li>
</ul>
</li>
<li>During 100% IO, on and off I noticed <strong>IO serialization</strong> with higher queue waits, which indicates that there is some degree of serialization overhead in OS; but not able to track which layer is triggering this. Tried with cfq/deadline, still the same. </li>
<li>Next attempt will be replacing 3Gb/s SAS to fiber channel HBA or 6Gb/s SAS (PERC H800) to see how it performs along with combination of HW and SW raid instead of only depending on controller. </li>
</ol><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24064&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24064&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Venu Anuganti";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:15;a:6:{s:4:"data";s:68:"
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:25:"Fast reads or fast scans?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-5915567578707286635.post-2581232363416829646";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:65:"http://mysqlha.blogspot.com/2010/03/fast-reads-or-fast-scans.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:8937:"MyISAM is frequently described and marketed as providing fast reads when it really provides fast index and table scans. This is a more narrow use case as fast reads implies great performance for most queries while fast scans implies great performance for single-table queries that are index only or do a full table scan.MyISAM caches index blocks but not data blocks. There can be a lot of overhead from re-reading data blocks from the OS buffer cache assuming mmap is not used. InnoDB and PBXT are 20X faster than MyISAM for some of my tests. However, I suspect that mutex contention on the key cache is also a factor in the performance differences.While there are many claims about the great performance of MyISAM. There are not as many examples that explain when it is fast. Alas, the same marketing technique is being repeated with NoSQL to the disadvantage of MySQL.http://dev.mysql.com/doc/refman/5.5/en/ansi-diff-transactions.htmlhttp://dev.mysql.com/doc/refman/5.0/en/ansi-diff-foreign-keys.htmlhttp://www.mysql.com/products/dwhttp://dev.mysql.com/doc/refman/5.5/en/storage-engine-compare-transactions.htmlTests were run on a server that reports 16 CPU cores. The full test configuration is described elsewhere. For this test I modified the sysbench oltp test to do a self-join query. I will publish the code soon. The schema for the test is:CREATE TABLE sbtest (&nbsp; id int(10) unsigned NOT NULL AUTO_INCREMENT,&nbsp; k int(10) unsigned NOT NULL DEFAULT '0',&nbsp; c char(120) NOT NULL DEFAULT '',&nbsp; pad char(60) NOT NULL DEFAULT '',&nbsp; PRIMARY KEY (id),&nbsp; KEY k (k)) ENGINE=InnoDB;The self-join query uses a range predicate that selects a fixed number (1, 10, 100, 1000 or 10000) of rows. This is an example that selects 1000 rows.SELECT t1.c, t2.c FROM sbtest t1, sbtest t2 WHERE t1.id between 245793 and 246792 and t2.id = 2000000 - t1.idTests were run using MySQL 5.1.45 for MyISAM, InnoDB plugin 1.0.6 and PBXT 1.1. Results are in queries per second for 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 concurrent clients. I do not report results for 512 and 1024 clients to avoid long lines in this post. The performance of MyISAM is much worse compared to InnoDB and PBXT as the number of rows selected grows from 1 to 10,000.Queries per second when the between predicate selects 1 row: &nbsp; 6843&nbsp; 13157&nbsp; 24552&nbsp; 46822&nbsp; 62588&nbsp; 57023&nbsp; 46568&nbsp; 30582&nbsp; 18745 innodb&nbsp; 6164&nbsp; 13627&nbsp; 25671&nbsp; 48705&nbsp; 63741&nbsp; 59217&nbsp; 48300&nbsp; 30964&nbsp; 18866 pbxt&nbsp; 6354&nbsp; 12061&nbsp; 23373&nbsp; 44284&nbsp; 50778&nbsp; 49546&nbsp; 44412&nbsp; 30444&nbsp; 18827 myisamQueries per second when the between predicate selects 10 rows:&nbsp; 4240&nbsp;&nbsp; 8466&nbsp; 16387&nbsp; 33221&nbsp; 53902&nbsp; 39599&nbsp; 36214&nbsp; 28026&nbsp; 18084 innodb&nbsp; 4802&nbsp;&nbsp; 8835&nbsp; 17688&nbsp; 35917&nbsp; 57461&nbsp; 47691&nbsp; 41578&nbsp; 29087&nbsp; 18558 pbxt&nbsp; 3890&nbsp;&nbsp; 7129&nbsp; 12512&nbsp; 16450&nbsp; 12272&nbsp; 12304&nbsp; 12441&nbsp; 12448&nbsp; 11304 myisamQueries per second when the between predicate selects 100 rows:&nbsp; 1842&nbsp;&nbsp; 3455&nbsp;&nbsp; 7249&nbsp; 14842&nbsp; 20206&nbsp; 13875&nbsp; 13471&nbsp; 12942&nbsp; 12344 innodb&nbsp; 2113&nbsp;&nbsp; 3522&nbsp;&nbsp; 7893&nbsp; 13411&nbsp; 18597&nbsp; 18905&nbsp; 18694&nbsp; 18123&nbsp; 12301 pbxt&nbsp; 1608&nbsp;&nbsp; 2260&nbsp;&nbsp; 2263&nbsp;&nbsp; 1899&nbsp;&nbsp; 1371&nbsp;&nbsp; 1399&nbsp;&nbsp; 1451&nbsp;&nbsp; 1468&nbsp;&nbsp; 1442 myisamQueries per second when the between predicate selects 1000 rows:&nbsp;&nbsp; 380&nbsp;&nbsp;&nbsp; 654&nbsp;&nbsp; 1222&nbsp;&nbsp; 2023&nbsp;&nbsp; 2487&nbsp;&nbsp; 1866&nbsp;&nbsp; 1791&nbsp;&nbsp; 1794&nbsp;&nbsp; 1942 innodb&nbsp;&nbsp; 303&nbsp;&nbsp;&nbsp; 641&nbsp;&nbsp; 1149&nbsp;&nbsp; 1699&nbsp;&nbsp; 2044&nbsp;&nbsp; 2069&nbsp;&nbsp; 2072&nbsp;&nbsp; 2063&nbsp;&nbsp; 2056 pbxt&nbsp;&nbsp; 232&nbsp;&nbsp;&nbsp; 248&nbsp;&nbsp;&nbsp; 227&nbsp;&nbsp;&nbsp; 189&nbsp;&nbsp;&nbsp; 141&nbsp;&nbsp;&nbsp; 143&nbsp;&nbsp;&nbsp; 149&nbsp;&nbsp;&nbsp; 148&nbsp;&nbsp;&nbsp; 148 myisamQueries per second when the between predicate selects 10000 rows:&nbsp;&nbsp;&nbsp; 43&nbsp;&nbsp;&nbsp;&nbsp; 70&nbsp;&nbsp;&nbsp; 130&nbsp;&nbsp;&nbsp; 213&nbsp;&nbsp;&nbsp; 254&nbsp;&nbsp;&nbsp; 199&nbsp;&nbsp;&nbsp; 194&nbsp;&nbsp;&nbsp; 196&nbsp;&nbsp;&nbsp; 199 innodb&nbsp;&nbsp;&nbsp; 49&nbsp;&nbsp;&nbsp;&nbsp; 69&nbsp;&nbsp;&nbsp; 123&nbsp;&nbsp;&nbsp; 182&nbsp;&nbsp;&nbsp; 213&nbsp;&nbsp;&nbsp; 216&nbsp;&nbsp;&nbsp; 216&nbsp;&nbsp;&nbsp; 216&nbsp;&nbsp;&nbsp; 216 pbxt&nbsp;&nbsp;&nbsp; 24&nbsp;&nbsp;&nbsp;&nbsp; 24&nbsp;&nbsp;&nbsp;&nbsp; 23&nbsp;&nbsp;&nbsp;&nbsp; 19&nbsp;&nbsp;&nbsp;&nbsp; 14&nbsp;&nbsp;&nbsp;&nbsp; 14&nbsp;&nbsp;&nbsp;&nbsp; 15&nbsp;&nbsp;&nbsp;&nbsp; 15&nbsp;&nbsp;&nbsp;&nbsp; 15 myisamMyISAM is at a disadvantage because it does not cache data blocks, so I changed the query to be index only and it is listed below. This did not make MyISAM faster. I think the bottleneck is contention on the key cache mutex.SELECT t1.id, t2.id FROM sbtest t1, sbtest t2 WHERE t1.id between 245793 and 246792 and t2.id = 2000000 - t1.idQueries per second for range 1000 using the index only query:&nbsp;&nbsp; 457&nbsp;&nbsp;&nbsp; 706&nbsp;&nbsp; 1354&nbsp;&nbsp; 2146&nbsp;&nbsp; 2596&nbsp;&nbsp; 2044&nbsp;&nbsp; 1918&nbsp;&nbsp; 1887&nbsp;&nbsp; 1953 innodb&nbsp;&nbsp; 576&nbsp;&nbsp;&nbsp; 837&nbsp;&nbsp; 1386&nbsp;&nbsp; 1681&nbsp;&nbsp; 2058&nbsp;&nbsp; 2094&nbsp;&nbsp; 2103&nbsp;&nbsp; 2095&nbsp;&nbsp; 2087 pbxt&nbsp;&nbsp; 353&nbsp;&nbsp;&nbsp; 244&nbsp;&nbsp;&nbsp; 223&nbsp;&nbsp;&nbsp; 190&nbsp;&nbsp;&nbsp; 140&nbsp;&nbsp;&nbsp; 142&nbsp;&nbsp;&nbsp; 147&nbsp;&nbsp;&nbsp; 146&nbsp;&nbsp;&nbsp; 146 myisamResults for MySQL 5.0.84 are similar to 5.1.45 for the range 1000 query:&nbsp;&nbsp; 390&nbsp;&nbsp;&nbsp; 642&nbsp;&nbsp; 1241&nbsp;&nbsp; 2045&nbsp;&nbsp; 2547&nbsp;&nbsp; 1891&nbsp;&nbsp; 1825&nbsp;&nbsp; 1813&nbsp;&nbsp; 1930 innodb&nbsp;&nbsp; 303&nbsp;&nbsp;&nbsp; 239&nbsp;&nbsp;&nbsp; 225&nbsp;&nbsp;&nbsp; 189&nbsp;&nbsp;&nbsp; 140&nbsp;&nbsp;&nbsp; 141&nbsp;&nbsp;&nbsp; 147&nbsp;&nbsp;&nbsp; 146&nbsp;&nbsp;&nbsp; 146 myisamThe query plan for the basic query:explain&nbsp; SELECT t1.c, t2.cfrom sbtest t1, sbtest t2where t1.id between 245793 and 246792 and t2.id = 2000000 - t1.id*************************** 1. row ***************************&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id: 1&nbsp; select_type: SIMPLE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; table: t1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; type: rangepossible_keys: PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key: PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key_len: 4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ref: NULL&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rows: 1072&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Extra: Using where; Using index*************************** 2. row ***************************&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id: 1&nbsp; select_type: SIMPLE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; table: t2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; type: refpossible_keys: PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key: PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key_len: 4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ref: func&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rows: 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Extra: Using where; Using index2 rows in set (0.01 sec)The query plan for the index only join: explain&nbsp; SELECT t1.id, t2.idfrom sbtest t1, sbtest t2where t1.id between 1916457 and 1917456 and t2.id = 2000000 - t1.id*************************** 1. row ***************************&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id: 1&nbsp; select_type: SIMPLE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; table: t1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; type: rangepossible_keys: PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key: PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key_len: 4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ref: NULL&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rows: 978&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Extra: Using where; Using index*************************** 2. row ***************************&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id: 1&nbsp; select_type: SIMPLE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; table: t2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; type: eq_refpossible_keys: PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key: PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key_len: 4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ref: func&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rows: 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Extra: Using where; Using index2 rows in set (0.00 sec)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sun, 28 Mar 2010 23:39:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:6:{i:0;a:5:{s:4:"data";s:4:"rant";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:11:"performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:6:"myisam";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:6:"innodb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:4:"pbxt";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:10525:"MyISAM is frequently described and marketed as providing fast reads when it really provides fast index and table scans. This is a more narrow use case as fast reads implies great performance for most queries while fast scans implies great performance for single-table queries that are index only or do a full table scan.<br /><br />MyISAM caches index blocks but not data blocks. There can be a lot of overhead from re-reading data blocks from the OS buffer cache assuming mmap is not used. InnoDB and PBXT are 20X faster than MyISAM for some of my tests. However, I suspect that mutex contention on the key cache is also a factor in the performance differences.<br /><br />While there are many claims about the great performance of MyISAM. There are not as many examples that explain when it is fast. Alas, the same marketing technique is being repeated with NoSQL to the disadvantage of MySQL.<br /><ul><li><a href="http://dev.mysql.com/doc/refman/5.5/en/ansi-diff-transactions.html">http://dev.mysql.com/doc/refman/5.5/en/ansi-diff-transactions.html</a></li><li><a href="http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-foreign-keys.html">http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-foreign-keys.html</a></li><li><a href="http://www.mysql.com/products/dw">http://www.mysql.com/products/dw</a></li><li><a href="http://dev.mysql.com/doc/refman/5.5/en/storage-engine-compare-transactions.html">http://dev.mysql.com/doc/refman/5.5/en/storage-engine-compare-transactions.html</a></li></ul>Tests were run on a server that reports 16 CPU cores. The full test configuration is <a href="http://www.facebook.com/notes/mysqlfacebook/pbxt-still-looks-good/379934640932">described elsewhere</a>. For this test I modified the sysbench oltp test to do a self-join query. I will publish the code soon. The schema for the test is:<br /><blockquote>CREATE TABLE sbtest (<br />&nbsp; id int(10) unsigned NOT NULL AUTO_INCREMENT,<br />&nbsp; k int(10) unsigned NOT NULL DEFAULT '0',<br />&nbsp; c char(120) NOT NULL DEFAULT '',<br />&nbsp; pad char(60) NOT NULL DEFAULT '',<br />&nbsp; PRIMARY KEY (id),<br />&nbsp; KEY k (k)<br />) ENGINE=InnoDB;</blockquote>The self-join query uses a range predicate that selects a fixed number (1, 10, 100, 1000 or 10000) of rows. This is an example that selects 1000 rows.<br /><blockquote>SELECT t1.c, t2.c FROM sbtest t1, sbtest t2 <br />WHERE t1.id between 245793 and 246792 and t2.id = 2000000 - t1.id</blockquote>Tests were run using MySQL 5.1.45 for MyISAM, InnoDB plugin 1.0.6 and PBXT 1.1. Results are in queries per second for 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 concurrent clients. I do not report results for 512 and 1024 clients to avoid long lines in this post. <br /><br />The performance of MyISAM is much worse compared to InnoDB and PBXT as the number of rows selected grows from 1 to 10,000.<br /><br />Queries per second when the between predicate selects 1 row: <br />&nbsp; 6843&nbsp; 13157&nbsp; 24552&nbsp; 46822&nbsp; 62588&nbsp; 57023&nbsp; 46568&nbsp; 30582&nbsp; 18745 innodb<br />&nbsp; 6164&nbsp; 13627&nbsp; 25671&nbsp; 48705&nbsp; 63741&nbsp; 59217&nbsp; 48300&nbsp; 30964&nbsp; 18866 pbxt<br />&nbsp; 6354&nbsp; 12061&nbsp; 23373&nbsp; 44284&nbsp; 50778&nbsp; 49546&nbsp; 44412&nbsp; 30444&nbsp; 18827 myisam<br /><br />Queries per second when the between predicate selects 10 rows:<br />&nbsp; 4240&nbsp;&nbsp; 8466&nbsp; 16387&nbsp; 33221&nbsp; 53902&nbsp; 39599&nbsp; 36214&nbsp; 28026&nbsp; 18084 innodb<br />&nbsp; 4802&nbsp;&nbsp; 8835&nbsp; 17688&nbsp; 35917&nbsp; 57461&nbsp; 47691&nbsp; 41578&nbsp; 29087&nbsp; 18558 pbxt<br />&nbsp; 3890&nbsp;&nbsp; 7129&nbsp; 12512&nbsp; 16450&nbsp; 12272&nbsp; 12304&nbsp; 12441&nbsp; 12448&nbsp; 11304 myisam<br /><br />Queries per second when the between predicate selects 100 rows:<br />&nbsp; 1842&nbsp;&nbsp; 3455&nbsp;&nbsp; 7249&nbsp; 14842&nbsp; 20206&nbsp; 13875&nbsp; 13471&nbsp; 12942&nbsp; 12344 innodb<br />&nbsp; 2113&nbsp;&nbsp; 3522&nbsp;&nbsp; 7893&nbsp; 13411&nbsp; 18597&nbsp; 18905&nbsp; 18694&nbsp; 18123&nbsp; 12301 pbxt<br />&nbsp; 1608&nbsp;&nbsp; 2260&nbsp;&nbsp; 2263&nbsp;&nbsp; 1899&nbsp;&nbsp; 1371&nbsp;&nbsp; 1399&nbsp;&nbsp; 1451&nbsp;&nbsp; 1468&nbsp;&nbsp; 1442 myisam<br /><br />Queries per second when the between predicate selects 1000 rows:<br />&nbsp;&nbsp; 380&nbsp;&nbsp;&nbsp; 654&nbsp;&nbsp; 1222&nbsp;&nbsp; 2023&nbsp;&nbsp; 2487&nbsp;&nbsp; 1866&nbsp;&nbsp; 1791&nbsp;&nbsp; 1794&nbsp;&nbsp; 1942 innodb<br />&nbsp;&nbsp; 303&nbsp;&nbsp;&nbsp; 641&nbsp;&nbsp; 1149&nbsp;&nbsp; 1699&nbsp;&nbsp; 2044&nbsp;&nbsp; 2069&nbsp;&nbsp; 2072&nbsp;&nbsp; 2063&nbsp;&nbsp; 2056 pbxt<br />&nbsp;&nbsp; 232&nbsp;&nbsp;&nbsp; 248&nbsp;&nbsp;&nbsp; 227&nbsp;&nbsp;&nbsp; 189&nbsp;&nbsp;&nbsp; 141&nbsp;&nbsp;&nbsp; 143&nbsp;&nbsp;&nbsp; 149&nbsp;&nbsp;&nbsp; 148&nbsp;&nbsp;&nbsp; 148 myisam<br /><br />Queries per second when the between predicate selects 10000 rows:<br />&nbsp;&nbsp;&nbsp; 43&nbsp;&nbsp;&nbsp;&nbsp; 70&nbsp;&nbsp;&nbsp; 130&nbsp;&nbsp;&nbsp; 213&nbsp;&nbsp;&nbsp; 254&nbsp;&nbsp;&nbsp; 199&nbsp;&nbsp;&nbsp; 194&nbsp;&nbsp;&nbsp; 196&nbsp;&nbsp;&nbsp; 199 innodb<br />&nbsp;&nbsp;&nbsp; 49&nbsp;&nbsp;&nbsp;&nbsp; 69&nbsp;&nbsp;&nbsp; 123&nbsp;&nbsp;&nbsp; 182&nbsp;&nbsp;&nbsp; 213&nbsp;&nbsp;&nbsp; 216&nbsp;&nbsp;&nbsp; 216&nbsp;&nbsp;&nbsp; 216&nbsp;&nbsp;&nbsp; 216 pbxt<br />&nbsp;&nbsp;&nbsp; 24&nbsp;&nbsp;&nbsp;&nbsp; 24&nbsp;&nbsp;&nbsp;&nbsp; 23&nbsp;&nbsp;&nbsp;&nbsp; 19&nbsp;&nbsp;&nbsp;&nbsp; 14&nbsp;&nbsp;&nbsp;&nbsp; 14&nbsp;&nbsp;&nbsp;&nbsp; 15&nbsp;&nbsp;&nbsp;&nbsp; 15&nbsp;&nbsp;&nbsp;&nbsp; 15 myisam<br /><br />MyISAM is at a disadvantage because it does not cache data blocks, so I changed the query to be index only and it is listed below. This did not make MyISAM faster. I think the bottleneck is contention on the key cache mutex.<br /><blockquote>SELECT t1.id, t2.id FROM sbtest t1, sbtest t2 <br />WHERE t1.id between 245793 and 246792 and t2.id = 2000000 - t1.id</blockquote>Queries per second for range 1000 using the index only query:<br />&nbsp;&nbsp; 457&nbsp;&nbsp;&nbsp; 706&nbsp;&nbsp; 1354&nbsp;&nbsp; 2146&nbsp;&nbsp; 2596&nbsp;&nbsp; 2044&nbsp;&nbsp; 1918&nbsp;&nbsp; 1887&nbsp;&nbsp; 1953 innodb<br />&nbsp;&nbsp; 576&nbsp;&nbsp;&nbsp; 837&nbsp;&nbsp; 1386&nbsp;&nbsp; 1681&nbsp;&nbsp; 2058&nbsp;&nbsp; 2094&nbsp;&nbsp; 2103&nbsp;&nbsp; 2095&nbsp;&nbsp; 2087 pbxt<br />&nbsp;&nbsp; 353&nbsp;&nbsp;&nbsp; 244&nbsp;&nbsp;&nbsp; 223&nbsp;&nbsp;&nbsp; 190&nbsp;&nbsp;&nbsp; 140&nbsp;&nbsp;&nbsp; 142&nbsp;&nbsp;&nbsp; 147&nbsp;&nbsp;&nbsp; 146&nbsp;&nbsp;&nbsp; 146 myisam<br /><br />Results for MySQL 5.0.84 are similar to 5.1.45 for the range 1000 query:<br />&nbsp;&nbsp; 390&nbsp;&nbsp;&nbsp; 642&nbsp;&nbsp; 1241&nbsp;&nbsp; 2045&nbsp;&nbsp; 2547&nbsp;&nbsp; 1891&nbsp;&nbsp; 1825&nbsp;&nbsp; 1813&nbsp;&nbsp; 1930 innodb<br />&nbsp;&nbsp; 303&nbsp;&nbsp;&nbsp; 239&nbsp;&nbsp;&nbsp; 225&nbsp;&nbsp;&nbsp; 189&nbsp;&nbsp;&nbsp; 140&nbsp;&nbsp;&nbsp; 141&nbsp;&nbsp;&nbsp; 147&nbsp;&nbsp;&nbsp; 146&nbsp;&nbsp;&nbsp; 146 myisam<br /><br />The query plan for the basic query:<br /><blockquote>explain&nbsp; SELECT t1.c, t2.c<br />from sbtest t1, sbtest t2<br />where t1.id between 245793 and 246792 and t2.id = 2000000 - t1.id</blockquote><br />*************************** 1. row ***************************<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id: 1<br />&nbsp; select_type: SIMPLE<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; table: t1<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; type: range<br />possible_keys: PRIMARY<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key: PRIMARY<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key_len: 4<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ref: NULL<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rows: 1072<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Extra: Using where; Using index<br />*************************** 2. row ***************************<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id: 1<br />&nbsp; select_type: SIMPLE<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; table: t2<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; type: ref<br />possible_keys: PRIMARY<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key: PRIMARY<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key_len: 4<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ref: func<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rows: 1<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Extra: Using where; Using index<br />2 rows in set (0.01 sec)<br /><br />The query plan for the index only join: <br /><blockquote>explain&nbsp; SELECT t1.id, t2.id<br />from sbtest t1, sbtest t2<br />where t1.id between 1916457 and 1917456 and t2.id = 2000000 - t1.id</blockquote>*************************** 1. row ***************************<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id: 1<br />&nbsp; select_type: SIMPLE<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; table: t1<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; type: range<br />possible_keys: PRIMARY<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key: PRIMARY<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key_len: 4<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ref: NULL<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rows: 978<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Extra: Using where; Using index<br />*************************** 2. row ***************************<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; id: 1<br />&nbsp; select_type: SIMPLE<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; table: t2<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; type: eq_ref<br />possible_keys: PRIMARY<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key: PRIMARY<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; key_len: 4<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ref: func<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rows: 1<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Extra: Using where; Using index<br />2 rows in set (0.00 sec)<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/5915567578707286635-2581232363416829646?l=mysqlha.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24060&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24060&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Mark Callaghan";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:16;a:6:{s:4:"data";s:68:"
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:41:"The need for tunability and measurability";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:33:"http://www.xaprb.com/blog/?p=1708";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:79:"http://www.xaprb.com/blog/2010/03/28/the-need-for-tunability-and-measurability/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3636:"To program is human, to instrument is divine.  Complex systems that will support a heavy workload will eventually have to be tuned for it.  There are two prerequisites for tuning: tunability, and measurability.

Tunability generally means configuration settings.  Adding configuration settings is a sign of a humble and wise programmer.  It means that the programmer acknowledges &#8220;I don&#8217;t understand how this system will be used, what environment it will run in, or even what my code really does.&#8221;  Sometimes things are hard-coded.  InnoDB is notorious for this, although don&#8217;t take that to mean that I think Heikki Tuuri isn&#8217;t humble and wise &#8212; nobody&#8217;s perfect.  Sometimes programmers set out to create systems that are self-tuning.  I&#8217;m not aware of any success stories I can point to in this regard, but I can point to plenty of failures.  Perhaps I can&#8217;t think of any successes because I don&#8217;t need to.

Measurability (instrumentation) is the next sign of a wise and humble programmer.  If your system must be tuned, then it needs to be measured to enable wise decisions.  There are at least two important kinds of metrics &#8212; a subject for another blog post.  Most large systems I&#8217;ve worked with (primarily database systems, but operating systems too) are seriously lacking in measurability.  A programmer who makes the system measurable acknowledges &#8220;I might be wrong, and if I am, it&#8217;s a good thing to enable people to prove it,&#8221; and realizes that &#8220;you cannot improve what you cannot measure.&#8221;

Complex, high-load systems get micro-optimized, making them even more opaque.  By the time an I/O operation in InnoDB reaches the disk, it&#8217;s often impossible to blame it on a specific query.  Not just because of lack of instrumentation &#8212; even with perfect instrumentation, I/O operations wouldn&#8217;t be assignable one-to-one with user actions.  Optimization does that, because a lot of optimizations are about deferring, anticipating, or combining work.  That makes instrumentation even more important.

This weekend, I heard conflicting stories about instrumentation in Postgres.  Someone claimed to have offered patches with a detailed set of instrumentation (I&#8217;d also heard this story from someone else at the same company, six months ago in a different place).  He told me that the maintainers had declined it on the basis of the added overhead.  Someone else told me that no such offer had been made, at least not in public where the decision could be taken to the mailing lists.  I don&#8217;t know what&#8217;s true.  I do know that stock Postgres is virtually un-instrumented in ways that matter a lot.  The same can be said of MySQL, although interestingly the Venn diagram of the ways these two projects are instrumented doesn&#8217;t overlap all that much.

The performance and maintenance cost of adding instrumentation to an application pales in comparison to the benefits.  There&#8217;s a famous quote from Oracle guru Tom Kyte, who when asked about the cost of Oracle&#8217;s performance instrumentation, estimated it at negative ten percent.  That is, without the ability to measure Oracle and thus improve it, it&#8217;d be at least ten percent slower.  I think ten percent is a modest estimate for most systems I work with.

Related posts:Sessions of interest at the Percona Performance Conference Having wriHow often should you use OPTIMIZE TABLE? Many timesA review of Optimizing Oracle Performance by Cary Millsap Optimizing
Related posts brought to you by Yet Another Related Posts Plugin.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sun, 28 Mar 2010 23:34:34 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:6:{i:0;a:5:{s:4:"data";s:10:"PostgreSQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:3:"SQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:15:"instrumentation";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:6:"Oracle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:8:"Tom Kyte";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:10:"Tunability";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4618:"<p>To program is human, to instrument is divine.  Complex systems that will support a heavy workload will eventually have to be tuned for it.  There are two prerequisites for tuning: tunability, and measurability.</p>

<p>Tunability generally means configuration settings.  Adding configuration settings is a sign of a humble and wise programmer.  It means that the programmer acknowledges &#8220;I don&#8217;t understand how this system will be used, what environment it will run in, or even what my code really does.&#8221;  Sometimes things are hard-coded.  InnoDB is notorious for this, although don&#8217;t take that to mean that I think Heikki Tuuri isn&#8217;t humble and wise &#8212; nobody&#8217;s perfect.  Sometimes programmers set out to create systems that are self-tuning.  I&#8217;m not aware of any success stories I can point to in this regard, but I can point to plenty of failures.  Perhaps I can&#8217;t think of any successes because I don&#8217;t need to.</p>

<p>Measurability (instrumentation) is the next sign of a wise and humble programmer.  If your system must be tuned, then it needs to be measured to enable wise decisions.  There are at least two important kinds of metrics &#8212; a subject for another blog post.  Most large systems I&#8217;ve worked with (primarily database systems, but operating systems too) are seriously lacking in measurability.  A programmer who makes the system measurable acknowledges &#8220;I might be wrong, and if I am, it&#8217;s a good thing to enable people to prove it,&#8221; and realizes that &#8220;you cannot improve what you cannot measure.&#8221;</p>

<p>Complex, high-load systems get micro-optimized, making them even more opaque.  By the time an I/O operation in InnoDB reaches the disk, it&#8217;s often impossible to blame it on a specific query.  Not just because of lack of instrumentation &#8212; even with perfect instrumentation, I/O operations wouldn&#8217;t be assignable one-to-one with user actions.  Optimization does that, because a lot of optimizations are about deferring, anticipating, or combining work.  That makes instrumentation even more important.</p>

<p>This weekend, I heard conflicting stories about instrumentation in Postgres.  Someone claimed to have offered patches with a detailed set of instrumentation (I&#8217;d also heard this story from someone else at the same company, six months ago in a different place).  He told me that the maintainers had declined it on the basis of the added overhead.  Someone else told me that no such offer had been made, at least not in public where the decision could be taken to the mailing lists.  I don&#8217;t know what&#8217;s true.  I do know that stock Postgres is virtually un-instrumented in ways that matter a lot.  The same can be said of MySQL, although interestingly the Venn diagram of the ways these two projects are instrumented doesn&#8217;t overlap all that much.</p>

<p>The performance and maintenance cost of adding instrumentation to an application pales in comparison to the benefits.  There&#8217;s a famous quote from Oracle guru Tom Kyte, who when asked about the cost of Oracle&#8217;s performance instrumentation, estimated it at negative ten percent.  That is, without the ability to measure Oracle and thus improve it, it&#8217;d be at least ten percent slower.  I think ten percent is a modest estimate for most systems I work with.</p>

<p>Related posts:<ol><li><a href="http://www.xaprb.com/blog/2009/04/17/sessions-of-interest-at-the-percona-performance-conference/" rel="bookmark" title="Permanent Link: Sessions of interest at the Percona Performance Conference">Sessions of interest at the Percona Performance Conference</a> <small>Having wri</small></li><li><a href="http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/" rel="bookmark" title="Permanent Link: How often should you use OPTIMIZE TABLE?">How often should you use OPTIMIZE TABLE?</a> <small>Many times</small></li><li><a href="http://www.xaprb.com/blog/2009/11/07/a-review-of-optimizing-oracle-performance-by-cary-millsap/" rel="bookmark" title="Permanent Link: A review of Optimizing Oracle Performance by Cary Millsap">A review of Optimizing Oracle Performance by Cary Millsap</a> <small>Optimizing</small></li></ol></p>
<p>Related posts brought to you by <a href="http://mitcho.com/code/yarpp/">Yet Another Related Posts Plugin</a>.</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24061&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24061&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:22:"Baron Schwartz (xaprb)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:17;a:6:{s:4:"data";s:33:"
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:2:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:37:"Sales en: MySQL, where are you going?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:31:"http://www.fromdual.ch/node/142";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:31:"http://www.fromdual.ch/node/142";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:238:"Our presentation "MySQL, where are you going?" of March 25 at the OpenExpo in Bern is now available in German and English.
When you have missed it, you can download it now from here...
The video recording should be available as well soon.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sun, 28 Mar 2010 18:51:09 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:899:"<p>Our presentation "<a href="http://www.fromdual.ch/sites/default/files/mysql-where-are-you-going.pdf" target="_blank">MySQL, where are you going?</a>" of March 25 at the <a href="http://www.openexpo.ch/" target="_blank">OpenExpo</a> in Bern is now available in <a href="http://www.fromdual.ch/sites/default/files/mysql-wohin-gehst-du_0.pdf" target="_blank">German</a> and <a href="http://www.fromdual.ch/sites/default/files/mysql-where-are-you-going.pdf" target="_blank">English</a>.</p>
<p>When you have missed it, you can download it now from <a href="http://www.fromdual.ch/presentations" target="_blank">here</a>...</p>
<p>The video recording should be available as well soon.</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24081&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24081&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:18;a:6:{s:4:"data";s:53:"
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:43:"Do we still need innodb_thread_concurrency?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-5915567578707286635.post-8713234088138469148";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:80:"http://mysqlha.blogspot.com/2010/03/do-we-still-need-innodbthreadconcurrenc.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1299:"Baron wrote this in a comment to a recent blog post.I consider innodb_thread_concurrency a vestigial tail of the “built-in  InnoDB” that ships by default with MySQL 5.0 or 5.1, and should  generally be set to 0 with recent versions of XtraDB or the InnoDB  Plugin.Can this be? I cannot wait for innodb_thread_concurrency to be made obsolete. I run a lot of CPU-bound benchmarks on 8 and 16 core servers and I always set it to 0 in my benchmark framework. A few times I have repeated tests with it set to a non-zero value to understand whether that helps and it has never helped. Alas, this will also make my FLIFO patch obsolete.I agree with Baron that it should be set to 0 with the InnoDB plugin and XtraDB. This is a big deal that has not received enough attention. InnoDB and XtraDB have gotten much better at supporting highly-concurrent workloads on many-core servers. For me highly-concurrent means 100 to 1000 concurrent transactions and many-core means 8 and 16 core servers.This is not an easy workload to support. MySQL is getting much better at it. A lot of work remains to be done. MySQL 5.5 has even more improvements and several problems have yet to be fixed in InnoDB. But this is a huge deal. Maybe we can have a going away party for innodb_thread_concurrency at the conference?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sun, 28 Mar 2010 15:44:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:11:"performance";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:6:"innodb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2044:"Baron wrote this in a comment to a <a href="http://www.xaprb.com/blog/2010/03/04/a-growing-trend-innodb-mutex-contention/">recent blog post</a>.<br /><blockquote>I consider innodb_thread_concurrency a vestigial tail of the “built-in  InnoDB” that ships by default with MySQL 5.0 or 5.1, and should  generally be set to 0 with recent versions of XtraDB or the InnoDB  Plugin.</blockquote>Can this be? I cannot wait for <a href="http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_thread_concurrency">innodb_thread_concurrency</a> to be made obsolete. I run a lot of CPU-bound benchmarks on 8 and 16 core servers and I always set it to 0 in my benchmark framework. A few times I have repeated tests with it set to a non-zero value to understand whether that helps and it has never helped. Alas, this will also make <a href="http://www.facebook.com/note.php?note_id=175800920932">my FLIFO patch</a> obsolete.<br /><br />I agree with Baron that it should be set to 0 with the InnoDB plugin and XtraDB. This is a big deal that has not received enough attention. InnoDB and XtraDB have gotten much better at supporting highly-concurrent workloads on many-core servers. For me highly-concurrent means 100 to 1000 concurrent transactions and many-core means 8 and 16 core servers.<br /><br />This is not an easy workload to support. MySQL is getting much better at it. A lot of work remains to be done. MySQL 5.5 has even more improvements and several problems have yet to be fixed in InnoDB. But this is a huge deal. Maybe we can have a going away party for innodb_thread_concurrency at <a href="http://en.oreilly.com/mysql2010/">the conference</a>?<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/5915567578707286635-8713234088138469148?l=mysqlha.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24058&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24058&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Mark Callaghan";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:19;a:6:{s:4:"data";s:43:"
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:40:"More fun with the MySQL Audit Plugin API";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-9144505959002328789.post-3708902187920845720";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:89:"http://karlssonondatabases.blogspot.com/2010/03/more-fun-with-mysql-audit-plugin-api.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:5276:"The Audit API has more uses that you may think! When a statement is executed in the server, the notification function in this API will be called, and we can use that do some interesting things!Like: Ever wanted to know what the most executed query in your running system is? I mean the information is in there somewhere, right, it's just q question of how to get at it? And frankly, I don't think the queries:SELECT * FROM user_data WHERE user_id = 57;andSELECT * FROM user_data WHERE user_id = 113;Should count as different queries? I mean, hey, the issue is that the application may be issuing the same query too many times, with different arguments mostly, but still the query:SELECT * FROM user_data WHERE user_id = &lt;somevalue&gt;Is executed too often? And if so, how can I figure it out? Well, using MySQL 5.5 with a combination of the Audit Plugin API and the INFORMATION_SCHEMA API, you can do this. The principle is simple:The logging of queriesThis is handled by the notification function in the Audit API. In the plugin, we have a an array of queries, and for every query we see, we "normalize it", in the sense that all literals are replaced by question marks, i.e. the query above would be represented as:SELECT * FROM user_data WHERE user_id = ?Then we look for this query among the queries I already know about. If the query is found, I just increment teh execution cound for it, and if it is not found, I add it. I keep track of X number of queries, and I log the time each query is added, so when I add a new query, and there already are X queries logged, I get rid of the one that was executed longest ago. And LRU list in short.In the sample I code I prove evetually, I also log some other data, but this is the basics. And the list is stored in memory by the way.Seeing the latest queriesFor viewing the queries, I use an INFORMATION_SCHEMA plugin, that is part of the same plugin library as the one above. although this is a different type of plugin. This is again pretty simple. The plugin will expose an INFORMATION_SCHEMA table that contains the list of the queries above, and when I select from that table, I materialize the list as a table. As this will then look like any normal table, you can use just about any SQL operation on it, like ordering, filtering specific queries etc. And all this with a simple SELECT!The issuesWill this affect performance? Yes, sure, but I don't really know to what extent right now, I havent gotten around to test that. And then there is one more issue: You need MySQL 5.5 m3 or above, which currently means you need to get it from Launchpad, there are still (March 29) only m2 binaries to be found at http://dev.mysql.com.Example codeNow it's time for some sample code. Lets first have a look at how it works. There is an information_schema table introduced by the plugin called my_is_proclist, and this is the one we are interested in. I have a table in the test database that I issue queries on. And it looks like this then:MySQL [test]> select last_executed, num_executes, statement from  information_schema.my_is_proclist where statement like '%from t1%';+---------------------+--------------+-------------------------------+| last_executed       | num_executes | statement                     |+---------------------+--------------+-------------------------------+| 2010-02-28 15:12:26 |            4 | select * from t1              || 2010-02-28 15:34:07 |            6 | select * from t1 where c1 < ? |+---------------------+--------------+-------------------------------+2 rows in set (0.00 sec)MySQL [test]> select * from t1 where c1 +------+------+| c1   | c2   |+------+------+|    1 |   16 ||    2 |   16 ||    3 |   16 |+------+------+3 rows in set (0.00 sec)MySQL [test]> select last_executed, num_executes, statement from  information_schema.my_is_proclist where statement like '%from t1%';+---------------------+--------------+-------------------------------+| last_executed       | num_executes | statement                     |+---------------------+--------------+-------------------------------+| 2010-02-28 15:12:26 |            4 | select * from t1              || 2010-02-28 15:34:12 |            7 | select * from t1 where c1 < ? |+---------------------+--------------+-------------------------------+2 rows in set (0.00 sec)As you can see, the counter goes of what I execute the query in question. Nothing complicated, and as you can see, I can do all sorts of filtering on the INFORMATION_SCHEMA table.Installing the pluginThe plugins are contained within one library. I have the sourcecode and a simple makefile downloadable here: http://www.papablues.com/src/my_proclist.tar.gzModify the supplied Makefile appropriately and run:makemake installNote that you must have rights to write to the MySQL Server plugin directory for the install to work! Once this is done, you must tell the MySQL server about the plugins:install plugin my_proclist soname 'my_proclist.so';install plugin my_is_proclist soname 'my_proclist.so';And that's it! Note that there is also a status variable that shows how many statement I currently keep track of. Currently, the max number of statement I track is 30, but there is a simple#define NUM_STMTS 30in my_proclist.cc that you may edit to fix that./Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sun, 28 Mar 2010 13:10:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:1:{i:0;a:5:{s:4:"data";s:22:"mysql audit plugin API";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:6930:"The <span>Audit API has more uses that you may think! When a statement is executed in the server, the notification function in this API will be called, and we can use that do some interesting things!</span><br /><br />Like: Ever wanted to know what the most executed query in your running system is? I mean the information is in there somewhere, right, it's just q question of how to get at it? And frankly, I don't think the queries:<br /><span>SELECT * FROM user_data WHERE user_id = 57;</span><br />and<br /><span>SELECT * FROM user_data WHERE user_id = 113;</span><br />Should count as different queries? I mean, hey, the issue is that the application may be issuing the same query too many times, with different arguments mostly, but still the query:<br /><span>SELECT * FROM user_data WHERE user_id = &lt;somevalue&gt;</span><br /><br />Is executed too often? And if so, how can I figure it out? Well, using MySQL 5.5 with a combination of the Audit Plugin API and the INFORMATION_SCHEMA API, you can do this. The principle is simple:<br /><br /><span>The logging of queries</span><br />This is handled by the notification function in the Audit API. In the plugin, we have a an array of queries, and for every query we see, we "normalize it", in the sense that all <span>literals</span><span><span><span></span></span></span> are replaced by question marks, i.e. the query above would be represented as:<br /><span>SELECT * FROM user_data WHERE user_id = ?</span><br />Then we look for this query among the queries I already know about. If the query is found, I just increment teh execution cound for it, and if it is not found, I add it. I keep track of X number of queries, and I log the time each query is added, so when I add a new query, and there already are X queries logged, I get rid of the one that was executed longest ago. And LRU list in short.<br />In the sample I code I prove evetually, I also log some other data, but this is the basics. And the list is stored in memory by the way.<br /><br /><span>Seeing the latest queries</span><br />For viewing the queries, I use an INFORMATION_SCHEMA plugin, that is part of the same plugin library as the one above. although this is a different type of plugin. This is again pretty simple. The plugin will expose an INFORMATION_SCHEMA table that contains the list of the queries above, and when I select from that table, I materialize the list as a table. As this will then look like any normal table, you can use just about any SQL operation on it, like ordering, filtering specific queries etc. And all this with a simple SELECT!<br /><br /><span>The issu</span>es<br />Will this affect performance? Yes, sure, but I don't really know to what extent right now, I havent gotten around to test that. And then there is one more issue: You need MySQL 5.5 m3 or above, which currently means you need to get it from <a href="https://launchpad.net/mysql-server/5.5">Launchpad</a>, there are still (March 29) only m2 binaries to be found at <a href="http://dev.mysql.com">http://dev.mysql.com</a>.<br /><br /><span></span><span>Example code<br /></span>Now it's time for some sample code. Lets first have a look at how it works. There is an information_schema table introduced by the plugin called <span>my_is_proclist</span>, and this is the one we are interested in. I have a table in the test database that I issue queries on. And it looks like this then:<br /><span>MySQL [test]> select last_executed, num_executes, statement from  information_schema.my_is_proclist where statement like '%from t1%';</span><br /><span>+---------------------+--------------+-------------------------------+</span><br /><span>| last_executed       | num_executes | statement                     |</span><br /><span>+---------------------+--------------+-------------------------------+</span><br /><span>| 2010-02-28 15:12:26 |            4 | select * from t1              |</span><br /><span>| 2010-02-28 15:34:07 |            6 | select * from t1 where c1 < ? |</span><br /><span>+---------------------+--------------+-------------------------------+</span><br /><span>2 rows in set (0.00 sec)</span><br /><br /><span>MySQL [test]> select * from t1 where c1 <><br /><span>+------+------+</span><br /><span>| c1   | c2   |</span><br /><span>+------+------+</span><br /><span>|    1 |   16 |</span><br /><span>|    2 |   16 |</span><br /><span>|    3 |   16 |</span><br /><span>+------+------+</span><br /><span>3 rows in set (0.00 sec)</span><br /><br /><span>MySQL [test]> select last_executed, num_executes, statement from  information_schema.my_is_proclist where statement like '%from t1%';</span><br /><span>+---------------------+--------------+-------------------------------+</span><br /><span>| last_executed       | num_executes | statement                     |</span><br /><span>+---------------------+--------------+-------------------------------+</span><br /><span>| 2010-02-28 15:12:26 |            4 | select * from t1              |</span><br /><span>| 2010-02-28 15:34:12 |            7 | select * from t1 where c1 < ? |</span><br /><span>+---------------------+--------------+-------------------------------+</span><br /><span>2 rows in set (0.00 sec)</span><br /><br /><span>As you can see, the counter goes of what I execute the query in question. Nothing complicated, and as you can see, I can do all sorts of filtering on the INFORMATION_SCHEMA table.</span><br /><br /><span>Installing the plugin</span><br /><span>The plugins are contained within one library. I have the sourcecode and a simple makefile downloadable here: </span><a href="http://www.papablues.com/src/my_proclist.tar.gz">http://www.papablues.com/src/my_proclist.tar.gz</a><br /><br /><span>Modify the supplied Makefile appropriately and run:</span><br /><span>make</span><br /><span>make install</span><br /><span>Note that you must have rights to write to the MySQL Server plugin directory for the install to work! Once this is done, you must tell the MySQL server about the plugins:</span><br /><span>install plugin my_proclist soname 'my_proclist.so';</span><br /><span>install plugin my_is_proclist soname 'my_proclist.so';</span><br /><span>And that's it! Note that there is also a status variable that shows how many statement I currently keep track of. Currently, the max number of statement I track is 30, but there is a simple<br /><span>#define NUM_STMTS 30</span><br />in <span>my_proclist.cc</span> that you may edit to fix that.</span><br /><br /><span>/Karlsson</span><br /><span></span></span><div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/9144505959002328789-3708902187920845720?l=karlssonondatabases.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24057&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24057&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Anders Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:20;a:6:{s:4:"data";s:33:"
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:2:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:21:"PBXT still looks good";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:53:"http://www.facebook.com/note.php?note_id=379934640932";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:53:"http://www.facebook.com/note.php?note_id=379934640932";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:7546:"I ran several CPU-bound tests using sysbench and MySQL 5.1.45 with the Facebook patch for MySQL 5.1, PBXT 1.1, InnoDB plugin 1.0.6 and MyISAM. The server reports 16 x86 CPU cores. The Facebook patch has many useful changes and we are still trying to figure out whether they improve performance for sysbench.

All tests were run with a warm buffer cache for 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 threads. The data was cached by each storage engine with one exception. MyISAM does not cache data blocks.

The results are interesting. PBXT does great and is similar to InnoDB except on the oltp read-only test that uses HANDLER. In that case it is much faster at high concurrency. I am not ready to switch yet, but this is another positive step in my evaluation. I need to run a few IO-bound tests with a read-write workload. Eventually I will test it with a real load from production. I don't understand the poor performance of MyISAM on the oltp read-only test. Nothing in the workload gets a table-level lock.

The MySQL build was configured via:

configure --enable-thread-safe-client \
--prefix=/data/5145fb \
--exec-prefix=/data/5145fb \
--with-plugins=csv,blackhole,myisam,heap,innodb_plugin,pbxt \
--without-plugin-innobase \
--with-fast-mutexes \
--with-unix-socket-path=/data/5145fb/var/mysql.sock \
--with-extra-charsets=all \
C_EXTRA_FLAGS="-g -O2 -fno-omit-frame-pointer"


These values were used for the my.cnf file. The InnoDB plugin is linked to mysqld via the Facebook patch so there are no plugin options for it here.

[mysqld]
plugin-load=libpbxt.so

pbxt
pbxt_index_cache_size=250M
pbxt_record_cache_size=750M
pbxt_flush_log_at_trx_commit=2

innodb_buffer_pool_size=1000M
innodb_log_file_size=100M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=80
innodb_file_per_table
innodb_file_format=barracuda

max_connections=2000
table_cache=2000

key_buffer_size=1000M

Results in transactions per second for oltp read-write. MyISAM used LOCK/UNLOCK TABLES instead of BEGIN/COMMIT. Each transaction is 14 SELECT statements, 3 UPDATE statements, 1 DELETE, 1 INSERT and either BEGIN/COMMIT or LOCK/UNLOCK TABLES.

   109    233    452    853   1196   1200   1193   1160   1134   1041    604 innodb
   116    229    460    783   1098   1090    832   1034    690    839    580 pbxt
   115     78     76     68     71     63     71     69     71     73     72 myisam

Results in transactions per second for oltp read-write as above with the binlog enabled and these additions to my.cnf: log_bin, binlog_format=row, innodb_autoinc_lock_mode=2, sync_binlog=1

    98    185    421    739   1022    997   1035   1033   1047    963    607 innodb
   103    194    443    751   1026   1040   1021    985    825    403    366 pbxt
    75     73     69     67     67     59     61     60     59     65     64 myisam

Results in transactions per second for oltp read-only. Each transaction is 14 SELECT statements, 1 BEGIN and 1 COMMIT.

  121    289    625   1115   1499   1519   1513   1490   1462   1401    802 innodb
  121    265    565   1112   1438   1447   1446   1433   1406   1373    805 pbxt
  119    208    338    332    270    268    268    268    267    263    266 myisam

Results in transactions per second for oltp read-only using fetch-by-primary-key. Each transaction is one SELECT statement.

  7058  14515  29477  52696 103555  73668  69167  62377  45565  29851  17658 innodb
  8001  14564  29787  54532 105984  76927  72830  65158  46798  30596  17627 pbxt
  7526  14783  29631  51748  70366  68800  66338  61183  45576  30003  17446 myisam

Results in transactions per second for oltp read-only using fetch-by-primary-key. Each transaction is one HANDLER statement.

  9511  18313  35725  63102 139822 117702  86835  88324  75762  37757  14561 innodb
  9500  19131  35220  61759 147070 127080 117668 117039 105228  89451  75212 pbxt
  8140  17085  34085  61101  85703  87634  87363  86349  84281  71658  66039 myisam

sysbench command line for oltp read-write excluding the connect options

sysbench --test=oltp --oltp-table-size=2000000 --max-time=60 --max-requests=0 \
  --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes \
   --oltp-dist-type=uniform --oltp-range-size=1000 --num-threads=$nt --seed-rng=$nt run

sysbench command line for oltp read-only excluding the connect options

sysbench --test=oltp --oltp-table-size=2000000 --max-time=60 --max-requests=0 \
  --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=no \
  --oltp-read-only --oltp-dist-type=uniform --oltp-range-size=1000 \
  --num-threads=$nt --seed-rng=$nt run

sysbench command line for oltp read-only using fetch-by-primary-key via SELECT excluding the connect options

sysbench --test=oltp --oltp-table-size=2000000 --max-time=60 --max-requests=0 \
  --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=no \
  --oltp-read-only --oltp-skip-trx --oltp-test-mode=simple --oltp-point-select-all-cols \
  --oltp-dist-type=uniform --oltp-range-size=1000 --num-threads=$nt --seed-rng=$nt run

sysbench command line for oltp read-only using fetch-by-primary-key via HANDLER excluding the connect options

sysbench --test=oltp --oltp-table-size=2000000 --max-time=60 --max-requests=0 \
  --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=no \
  --oltp-read-only --oltp-skip-trx --oltp-test-mode=simple --oltp-point-select-mysql-handler \
  --oltp-dist-type=uniform --oltp-range-size=1000 \
  --num-threads=$nt --seed-rng=$nt run


For the oltp read-write test, a typical transaction is listed below. For MyISAM LOCK TABLES sbtest WRITE is used in place of BEGIN and UNLOCK TABLES in place of COMMIT:

BEGIN;
SELECT c from sbtest  where id=98548;
SELECT c from sbtest  where id=222643;
SELECT c from sbtest  where id=752541;
SELECT c from sbtest  where id=105064;
SELECT c from sbtest  where id=1681121;
SELECT c from sbtest  where id=1387911;
SELECT c from sbtest  where id=255914;
SELECT c from sbtest  where id=1902428;
SELECT c from sbtest  where id=1861844;
SELECT c from sbtest  where id=923408;
SELECT c from sbtest where id between 1838755 and 1839754;
SELECT SUM(K) from sbtest where id between 172006 and 173005;
SELECT c from sbtest where id between 1299287 and 1300286 order by c;
SELECT DISTINCT c from sbtest where id between 1171355 and 1172355 order by c;
UPDATE sbtest set k=k+1 where id=932987;
UPDATE sbtest set c='36106073-929398913-440777565-168776012-39164047-125155124-555405956-995035124-527621711-785880565' where id=1038959;
UPDATE sbtest set k=k+1 where id=887049;
DELETE from sbtest where id=887049;
INSERT INTO sbtest values(887049,0,' ','aaaaaaaaaaffffffffffrrrrrrrrrreeeeeeeeeeyyyyyyyyyy');
COMMIT;


For the oltp read-only test, a typical transaction is:

BEGIN;
SET timestamp=1269656402;
SELECT c from sbtest  where id=1349447;
SELECT c from sbtest  where id=1088420;
SELECT c from sbtest  where id=713733;
SELECT c from sbtest  where id=301075;
SELECT c from sbtest  where id=118433;
SELECT c from sbtest  where id=779447;
SELECT c from sbtest  where id=458388;
SELECT c from sbtest  where id=670618;
SELECT c from sbtest  where id=1826244;
SELECT c from sbtest  where id=1329702;
SELECT c from sbtest where id between 1213862 and 1214861;
SELECT SUM(K) from sbtest where id between 671772 and 672771;
SELECT c from sbtest where id between 1072387 and 1073386 order by c;
SELECT DISTINCT c from sbtest where id between 1626563 and 1627563 order by c;
COMMIT;
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Sat, 27 Mar 2010 03:56:04 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:8055:"I ran several CPU-bound tests using <a href="http://launchpad.net/sysbench/0.4">sysbench</a> and MySQL 5.1.45 with the <a href="http://code.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1">Facebook patch for MySQL 5.1</a>, PBXT 1.1, InnoDB plugin 1.0.6 and MyISAM. The server reports 16 x86 CPU cores. The Facebook patch has many useful changes and we are still trying to figure out whether they improve performance for sysbench.

All tests were run with a warm buffer cache for 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 threads. The data was cached by each storage engine with one exception. MyISAM does not cache data blocks.

The results are interesting. PBXT does great and is similar to InnoDB except on the oltp read-only test that uses HANDLER. In that case it is much faster at high concurrency. I am not ready to switch yet, but this is another positive step in my evaluation. I need to run a few IO-bound tests with a read-write workload. Eventually I will test it with a real load from production. I don't understand the poor performance of MyISAM on the oltp read-only test. Nothing in the workload gets a table-level lock.

The MySQL build was configured via:
<pre>
configure --enable-thread-safe-client \
--prefix=/data/5145fb \
--exec-prefix=/data/5145fb \
--with-plugins=csv,blackhole,myisam,heap,innodb_plugin,pbxt \
--without-plugin-innobase \
--with-fast-mutexes \
--with-unix-socket-path=/data/5145fb/var/mysql.sock \
--with-extra-charsets=all \
C_EXTRA_FLAGS="-g -O2 -fno-omit-frame-pointer"
</pre>

These values were used for the my.cnf file. The InnoDB plugin is linked to mysqld via the Facebook patch so there are no plugin options for it here.
<pre>
[mysqld]
plugin-load=libpbxt.so

pbxt
pbxt_index_cache_size=250M
pbxt_record_cache_size=750M
pbxt_flush_log_at_trx_commit=2

innodb_buffer_pool_size=1000M
innodb_log_file_size=100M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=80
innodb_file_per_table
innodb_file_format=barracuda

max_connections=2000
table_cache=2000

key_buffer_size=1000M
</pre>
Results in transactions per second for oltp read-write. MyISAM used LOCK/UNLOCK TABLES instead of BEGIN/COMMIT. Each transaction is 14 SELECT statements, 3 UPDATE statements, 1 DELETE, 1 INSERT and either BEGIN/COMMIT or LOCK/UNLOCK TABLES.
<pre>
   109    233    452    853   1196   1200   1193   1160   1134   1041    604 innodb
   116    229    460    783   1098   1090    832   1034    690    839    580 pbxt
   115     78     76     68     71     63     71     69     71     73     72 myisam
</pre>
Results in transactions per second for oltp read-write as above with the binlog enabled and these additions to my.cnf: log_bin, binlog_format=row, innodb_autoinc_lock_mode=2, sync_binlog=1
<pre>
    98    185    421    739   1022    997   1035   1033   1047    963    607 innodb
   103    194    443    751   1026   1040   1021    985    825    403    366 pbxt
    75     73     69     67     67     59     61     60     59     65     64 myisam
</pre>
Results in transactions per second for oltp read-only. Each transaction is 14 SELECT statements, 1 BEGIN and 1 COMMIT.
<pre>
  121    289    625   1115   1499   1519   1513   1490   1462   1401    802 innodb
  121    265    565   1112   1438   1447   1446   1433   1406   1373    805 pbxt
  119    208    338    332    270    268    268    268    267    263    266 myisam
</pre>
Results in transactions per second for oltp read-only using fetch-by-primary-key. Each transaction is one SELECT statement.
<pre>
  7058  14515  29477  52696 103555  73668  69167  62377  45565  29851  17658 innodb
  8001  14564  29787  54532 105984  76927  72830  65158  46798  30596  17627 pbxt
  7526  14783  29631  51748  70366  68800  66338  61183  45576  30003  17446 myisam
</pre>
Results in transactions per second for oltp read-only using fetch-by-primary-key. Each transaction is one HANDLER statement.
<pre>
  9511  18313  35725  63102 139822 117702  86835  88324  75762  37757  14561 innodb
  9500  19131  35220  61759 147070 127080 117668 117039 105228  89451  75212 pbxt
  8140  17085  34085  61101  85703  87634  87363  86349  84281  71658  66039 myisam
</pre>
sysbench command line for oltp read-write excluding the connect options
<pre>
sysbench --test=oltp --oltp-table-size=2000000 --max-time=60 --max-requests=0 \
  --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes \
   --oltp-dist-type=uniform --oltp-range-size=1000 --num-threads=$nt --seed-rng=$nt run
</pre>
sysbench command line for oltp read-only excluding the connect options
<pre>
sysbench --test=oltp --oltp-table-size=2000000 --max-time=60 --max-requests=0 \
  --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=no \
  --oltp-read-only --oltp-dist-type=uniform --oltp-range-size=1000 \
  --num-threads=$nt --seed-rng=$nt run
</pre>
sysbench command line for oltp read-only using fetch-by-primary-key via SELECT excluding the connect options
<pre>
sysbench --test=oltp --oltp-table-size=2000000 --max-time=60 --max-requests=0 \
  --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=no \
  --oltp-read-only --oltp-skip-trx --oltp-test-mode=simple --oltp-point-select-all-cols \
  --oltp-dist-type=uniform --oltp-range-size=1000 --num-threads=$nt --seed-rng=$nt run
</pre>
sysbench command line for oltp read-only using fetch-by-primary-key via HANDLER excluding the connect options
<pre>
sysbench --test=oltp --oltp-table-size=2000000 --max-time=60 --max-requests=0 \
  --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=no \
  --oltp-read-only --oltp-skip-trx --oltp-test-mode=simple --oltp-point-select-mysql-handler \
  --oltp-dist-type=uniform --oltp-range-size=1000 \
  --num-threads=$nt --seed-rng=$nt run
</pre>

For the oltp read-write test, a typical transaction is listed below. For MyISAM <b>LOCK TABLES sbtest WRITE</b> is used in place of <b>BEGIN</b> and <b>UNLOCK TABLES</b> in place of <b>COMMIT</b>:
<pre>
BEGIN;
SELECT c from sbtest  where id=98548;
SELECT c from sbtest  where id=222643;
SELECT c from sbtest  where id=752541;
SELECT c from sbtest  where id=105064;
SELECT c from sbtest  where id=1681121;
SELECT c from sbtest  where id=1387911;
SELECT c from sbtest  where id=255914;
SELECT c from sbtest  where id=1902428;
SELECT c from sbtest  where id=1861844;
SELECT c from sbtest  where id=923408;
SELECT c from sbtest where id between 1838755 and 1839754;
SELECT SUM(K) from sbtest where id between 172006 and 173005;
SELECT c from sbtest where id between 1299287 and 1300286 order by c;
SELECT DISTINCT c from sbtest where id between 1171355 and 1172355 order by c;
UPDATE sbtest set k=k+1 where id=932987;
UPDATE sbtest set c='36106073-929398913-440777565-168776012-39164047-125155124-555405956-995035124-527621711-785880565' where id=1038959;
UPDATE sbtest set k=k+1 where id=887049;
DELETE from sbtest where id=887049;
INSERT INTO sbtest values(887049,0,' ','aaaaaaaaaaffffffffffrrrrrrrrrreeeeeeeeeeyyyyyyyyyy');
COMMIT;
</pre>

For the oltp read-only test, a typical transaction is:
<pre>
BEGIN;
SET timestamp=1269656402;
SELECT c from sbtest  where id=1349447;
SELECT c from sbtest  where id=1088420;
SELECT c from sbtest  where id=713733;
SELECT c from sbtest  where id=301075;
SELECT c from sbtest  where id=118433;
SELECT c from sbtest  where id=779447;
SELECT c from sbtest  where id=458388;
SELECT c from sbtest  where id=670618;
SELECT c from sbtest  where id=1826244;
SELECT c from sbtest  where id=1329702;
SELECT c from sbtest where id between 1213862 and 1214861;
SELECT SUM(K) from sbtest where id between 671772 and 672771;
SELECT c from sbtest where id between 1072387 and 1073386 order by c;
SELECT DISTINCT c from sbtest where id between 1626563 and 1627563 order by c;
COMMIT;
</pre><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24055&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24055&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:21;a:6:{s:4:"data";s:73:"
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:27:"My Impressions About MONyog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-8007802080401497299.post-6367354404289464068";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:70:"http://mmatemate.blogspot.com/2010/03/my-impressions-about-monyog.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:6316:"At work we have been looking for tools to monitor MySQL and at the same time provide as much diagnosis information as possible upfront when an alarm is triggered. After looking around at different options, I decided to test MONyog from Webyog, the makers of the better known SQLyog. Before we go on, the customary disclaimer: This review reflects my own opinion and in no way represents any decision that my current employer may or may not make in regards of this product.First ImpressionYou know what they say about the first impression, and in this where MONyog started with the right foot. Since it is an agent-less system, it only requires to install the RPM or untar the tarball in the server where you're going to run the monitor and launch the daemon to get started. How much faster or simpler can it be? But in order to start monitoring a server you need to do some preparations on it. Create a MONyog user for both the OS and the database. I used the following commands:For the OS user run the following command as root (thank you Tom):groupadd -g 250 monyog &amp;&amp; useradd -c 'MONyog User' -g 250 -G mysql -u 250 monyog &amp;&amp; echo 'your_os_password' | passwd --stdin monyogFor the MySQL user run:GRANT SELECT, RELOAD, PROCESS, SUPER on *.* to 'adm_monyog'@'10.%' IDENTIFIED BY 'your_db_password';Keep in mind that passwords are stored in the clear in the MONyog configuration database, defining a MONyog user helps to minimize security breaches. Although for testing purposes I decided to go with a username/password combination to SSH into the servers, it is possible to use a key which would be my preferred setting in production. The User InterfaceThe system UI is web driven using Ajax and Flash which makes it really thin and portable. I was able to test it without any issues using IE 8 and Firefox in Windows and Linux. Chrome presented some minor challenges but I didn't dig any deeper since I don't consider it stable enough and didn't want to get distracted with what could've been browser specific issues.In order to access MONyog you just point your browser the server where it was installed with an URL equivalent to:http://monyog-test.domain.com:5555 or http://localhost:5555You will always land in the List of Servers tab. At the bottom of this page there is a Register a New Server link that you follow and start adding servers at will. The process is straight forward and at any point you can trace your steps back to make any corrections as needed (see screenshot). Once you enter the server information with the credentials defined in the previous section, you are set. Once I went through the motions, the first limitation became obvious: You have to repeat the process for every server, although there is an option to copy from previously defined servers, it can become a very tedious process.Once you have the servers defined, to navigate into the actual system you need to check which servers you want to review, select the proper screen from a drop down box at the bottom of the screen and hit Go. This method seems straight forward, but at the beginning it is a little bit confusing and it takes some time to get used to it.FeaturesMONyog has plenty of features that make it worth trying if you're looking for a monitoring software for MySQL. Hopefully by now you have it installed and ready to go, so I'll comment from a big picture point of view and let you reach your own conclusions.The first feature that jumps right at me is its architecture, in particular the scripting support. All the variables it picks up from the servers it monitors are abstracted in JavaScript like objects and all the monitors, graphics and screens are based on these scripts. One the plus side, it adds a a lot of flexibility to how you can customize the alerts, monitors, rules and Dashboard display. On the other hand, this flexibility present some management challenges: customize thresholds, alerts and rules by servers or group of servers and backup of customized rules. None of these challenges are a showstopper and I'm sure MONyog will come up with solutions in future releases. Since everything is stored in SQLite databases and the repositories are documented, any SQLite client and some simple scripting is enough to get backups and workaround the limitations.The agent-less architecture requires the definition of users to log into the database and the OS in order to gather the information it needs. The weak point here is that the credentials, including passwords, are stored in the clear in the SQLite databases. A way to secure this is to properly limit the GRANTs for the MySQL users and ssh using a DSA key instead of password. Again, no showstopper for most installations, but it needs some work from Webyog's side to increase the overall system security.During our tests we ran against a bug in the SSH library used by MONyog. I engaged their Technical Support looking forward to evaluate their overall responsiveness. I have to say it was flawless, at no point they treated me in a condescending manner, made the most of the information I provided upfront and never wasted my time with scripted useless diagnostic routines. They had to provide me with a couple of binary builds, which they did in a very reasonably time frame. All in all, a great experience.My ConclusionMONyog doesn't provide any silver bullet or obscure best practice advice. It gathers all the environment variables effectively and presents it in an attractive and easy to read format. It's a closed source commercial software, the architecture is quite open through scripting and with well documented repositories which provides a lot of flexibility to allow for customizations and expansions to fit any installations needs. For installations with over 100 servers it might be more challenging to manage the servers configurations and the clear credentials may not be viable for some organizations. If these 2 issues are not an impediment, I definitively recommend any MySQL DBA to download the binaries and take it for a spin. It might be the solution you were looking for to keep an eye on your set of servers while freeing some time for other tasks.Let me know what do you think and if you plan to be at the MySQL UC, look me up to chat. Maybe we can invite Rohit Nadhani from Webyog to join us.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 26 Mar 2010 23:55:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:7:{i:0;a:5:{s:4:"data";s:9:"diagnosis";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"tools";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:6:"monyog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:5:"linux";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:16:"users conference";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:3:"dba";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:7415:"<h1></h1>At work we have been looking for tools to monitor MySQL and at the same time provide as much diagnosis information as possible upfront when an alarm is triggered. After looking around at different options, I decided to test <b>MONyog</b> from <a href="http://webyog.com/en/" title="Webyog">Webyog</a>, the makers of the better known <b>SQLyog</b>. Before we go on, the customary disclaimer: <i>This review reflects <b>my own opinion</b> and in no way represents any decision that my current employer may or may not make in regards of this product.</i><br /><h2>First Impression</h2>You know what they say about the first impression, and in this where MONyog started with the right foot. Since it is an agent-less system, it only requires to install the RPM or untar the tarball in the server where you're going to run the monitor and launch the daemon to get started. How much faster or simpler can it be? But in order to start monitoring a server you need to do some preparations on it. Create a MONyog user for both the OS and the database. I used the following commands:<br /><br />For the OS user run the following command as <i>root</i> (thank you Tom):<br /><blockquote>groupadd -g 250 monyog &amp;&amp; useradd -c 'MONyog User' -g 250 -G mysql -u 250 monyog &amp;&amp; echo 'your_os_password' | passwd --stdin monyog</blockquote>For the MySQL user run:<br /><blockquote>GRANT SELECT, RELOAD, PROCESS, SUPER on *.* to 'adm_monyog'@'10.%' IDENTIFIED BY 'your_db_password';</blockquote>Keep in mind that passwords are stored in the clear in the MONyog configuration database, defining a MONyog user helps to minimize security breaches. Although for testing purposes I decided to go with a username/password combination to SSH into the servers, it is possible to use a key which would be my preferred setting in production. <br /><br /><h2>The User Interface</h2>The system UI is web driven using <a href="http://en.wikipedia.org/wiki/Ajax_(programming)" title="Ajax">Ajax</a> and <a href="http://en.wikipedia.org/wiki/Adobe_Flash" title="Flash">Flash</a> which makes it really thin and portable. I was able to test it without any issues using IE 8 and Firefox in Windows and Linux. Chrome presented some minor challenges but I didn't dig any deeper since I don't consider it stable enough and didn't want to get distracted with what could've been browser specific issues.<br /><br />In order to access MONyog you just point your browser the server where it was installed with an URL equivalent to:<br /><blockquote>http://monyog-test.domain.com:5555 <i>or</i> http://localhost:5555</blockquote>You will always land in the <a href="http://webyog.com/images/screenshots_monyog/ListMultipleServers.jpg" title="List of Servers tab">List of Servers tab</a>. At the bottom of this page there is a <b>Register a New Server</b> link that you follow and start adding servers at will. The process is straight forward and at any point you can trace your steps back to make any corrections as needed (see <a href="http://webyog.com/images/screenshots_monyog/NewConnection1.jpg" title="screenshot">screenshot</a>). Once you enter the server information with the credentials defined in the previous section, you are set. Once I went through the motions, the first limitation became obvious: You have to repeat the process for <i>every</i> server, although there is an option to copy from previously defined servers, it can become a very tedious process.<br /><br />Once you have the servers defined, to navigate into the actual system you need to check which servers you want to review, select the proper screen from a drop down box at the bottom of the screen and hit <b>Go</b>. This method seems straight forward, but at the beginning it is a little bit confusing and it takes some time to get used to it.<br /><h2>Features</h2>MONyog has plenty of features that make it worth trying if you're looking for a monitoring software for MySQL. Hopefully by now you have it installed and ready to go, so I'll comment from a big picture point of view and let you reach your own conclusions.<br /><br />The first feature that jumps right at me is its architecture, in particular the scripting support. All the variables it picks up from the servers it monitors are abstracted in JavaScript like objects and all the monitors, graphics and screens are based on these scripts. One the plus side, it adds a a lot of flexibility to how you can customize the alerts, monitors, rules and Dashboard display. On the other hand, this flexibility present some management challenges: customize thresholds, alerts and rules by servers or group of servers and backup of customized rules. None of these challenges are a showstopper and I'm sure MONyog will come up with solutions in future releases. Since everything is stored in SQLite databases and the repositories are documented, any SQLite client and some simple scripting is enough to get backups and workaround the limitations.<br /><br />The agent-less architecture requires the definition of users to log into the database and the OS in order to gather the information it needs. The weak point here is that the credentials, including passwords, are stored in the clear in the SQLite databases. A way to secure this is to properly limit the GRANTs for the MySQL users and <b>ssh</b> using a DSA key instead of password. Again, no showstopper for most installations, but it needs some work from Webyog's side to increase the overall system security.<br /><br />During our tests we ran against a bug in the SSH library used by MONyog. I engaged their Technical Support looking forward to evaluate their overall responsiveness. I have to say it was flawless, at no point they treated me in a condescending manner, made the most of the information I provided upfront and never wasted my time with scripted useless diagnostic routines. They had to provide me with a couple of binary builds, which they did in a very reasonably time frame. All in all, a great experience.<br /><h2>My Conclusion</h2>MONyog doesn't provide any silver bullet or obscure best practice advice. It gathers all the environment variables effectively and presents it in an attractive and easy to read format. It's a closed source commercial software, the architecture is quite open through scripting and with well documented repositories which provides a lot of flexibility to allow for customizations and expansions to fit any installations needs. For installations with over 100 servers it might be more challenging to manage the servers configurations and the clear credentials may not be viable for some organizations. If these 2 issues are not an impediment, I definitively recommend any MySQL DBA to download the binaries and take it for a spin. It might be the solution you were looking for to keep an eye on your set of servers while freeing some time for other tasks.<br /><br />Let me know what do you think and if you plan to be at the MySQL UC, look me up to chat. Maybe we can invite Rohit Nadhani from Webyog to join us.<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/8007802080401497299-6367354404289464068?l=mmatemate.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24052&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24052&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Gerardo Narvaja";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:22;a:6:{s:4:"data";s:48:"
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:35:"[MySQL][Spider]Spider-2.17 released";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-7870178081855084823.post-5250647075769387441";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:75:"http://wild-growth.blogspot.com/2010/03/mysqlspiderspider-217-released.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:463:"I'm pleased to announce the release of Spider storage engine version 2.17(beta).Spider is a Storage Engine for database sharding.http://spiderformysql.com/The main changes in this version are following.- Add table parameter "semi_split_read_limit".- Add server parameter "spider_semi_split_read_limit".&nbsp;&nbsp;This parameters are for searching performance improvement.Please see "99_change_logs.txt" in the download documents for checking other changes.Enjoy!";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 26 Mar 2010 20:54:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:2:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:6:"Spider";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:948:"I'm pleased to announce the release of Spider storage engine version 2.17(beta).<br />Spider is a Storage Engine for database sharding.<br /><a href="http://spiderformysql.com/">http://spiderformysql.com/</a><br /><br />The main changes in this version are following.<br />- Add table parameter "semi_split_read_limit".<br />- Add server parameter "spider_semi_split_read_limit".<br />&nbsp;&nbsp;This parameters are for searching performance improvement.<br /><br />Please see "99_change_logs.txt" in the download documents for checking other changes.<br /><br />Enjoy!<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/7870178081855084823-5250647075769387441?l=wild-growth.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24051&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24051&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Kentoku SHIBA";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:23;a:6:{s:4:"data";s:53:"
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:23:"Thoughts on “NoSQL”";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:26:"http://oddments.org/?p=317";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:26:"http://oddments.org/?p=317";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3833:"I&#8217;ve decided to jump on the bandwagon and spill my thoughts on &#8220;NoSQL&#8221; since it&#8217;s been such a hot topic lately ([1], [2], [3], [4]). Since I work on the Drizzle project some folks would probably think I take the SQL side of the &#8220;debate,&#8221; but actually I&#8217;m pretty objective about the topic and find value in projects on both sides. Let me explain.
Last November at OpenSQL Camp I assembled a panel to debate &#8220;SQL vs NoSQL.&#8221; We had folks representing a variety of projects, including Cassandra, CouchDB, Drizzle, MariaDB, MongoDB, MySQL, and PostgreSQL. Even though I realized this was a poor name for such a panel, I went with it anyways because this &#8220;debate&#8221; was really starting to heat up. The conclusion I was hoping for is that the two are not at odds because the two categories of projects can peacefully co-exist in the same toolbox for data management. Beyond the panel name, even the term &#8220;NoSQL&#8221; is a bit misleading. I talked with Eric Evans (one of my new co-workers over on the Cassandra team) who reintroduced the term, and even he admits it is vague and doesn&#8217;t do the projects categorized by it any favors. What happens when Cassandra has a SQL interface stacked on top of it? Yeah.
One reason for all this confusion is that for some people, the term &#8220;database&#8221; equates to &#8220;relational database.&#8221; This makes the non-relational projects look foreign because they don&#8217;t fit the database model that became &#8220;traditional&#8221; due it&#8217;s popularity. Anyone who has ever read up on other database models would quickly realize relational is just one of many models, and many of the &#8220;NoSQL&#8221; projects fit quite nicely into one of these categories. The real value these new projects are providing are in their implementation details, especially with dynamic scale-out (adding new nodes to live systems) and synchronization mechanisms (eventual consistency or tunable quorum). There are a lot of great ideas in these projects, and people on the &#8220;SQL&#8221; side should really take the time to study them &#8211; there are some tricks to learn.

One of the main criticisms of the &#8220;NoSQL&#8221; projects is that they are taking a step back, simply reinventing a component that already exists in a relational model. While this may have some truth, if you gloss over the high-level logical data representations, this is just wrong. Sure, it may look like a simple key-value store from the outside, but there is a lot more under the hood. For many of these projects it was a design decision to focus on the implementation details where it matters, and not bother with things like parsing SQL and optimizing joins. I think there is still some value in supporting some form of a SQL interface because this gets you instant adoption by pretty much any developer out there. Love it or hate it, people know SQL. As for joins, scaling them with distributed relational nodes has been a research topic for years, and it&#8217;s a hard problem. People have worked around this by accepting new data models and consistency levels. It all depends on what your problem requires.
I fully embrace the &#8220;NoSQL&#8221; projects out there, there is something we can all learn from them even if we don&#8217;t put them into production. We should be thrilled we have more open source tools in our database toolbox, especially non-relational ones. We are no longer required to smash every dataset &#8220;peg&#8221; into the relational &#8220;hole.&#8221; Use the best tool for the job, this may still be a relational database. Explore your options, try to learn a few things, model your data in a number of ways, and find out what is really required. When it comes time to making a decision just remember:
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 26 Mar 2010 20:27:34 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:7:"Drizzle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:4:"Main";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:5444:"<p>I&#8217;ve decided to jump on the bandwagon and spill my thoughts on &#8220;NoSQL&#8221; since it&#8217;s been such a hot topic lately (<a href="http://scaledb.blogspot.com/2010/02/will-nosql-movement-unseat-database.html">[1]</a>, <a href="http://www.xaprb.com/blog/2010/03/08/nosql-doesnt-mean-non-relational/">[2]</a>, <a href="http://stu.mp/2010/03/nosql-vs-rdbms-let-the-flames-begin.html">[3]</a>, <a href="http://spyced.blogspot.com/2010/03/cassandra-in-action.html">[4]</a>). Since I work on the <a href="http://drizzle.org/">Drizzle</a> project some folks would probably think I take the SQL side of the &#8220;debate,&#8221; but actually I&#8217;m pretty objective about the topic and find value in projects on both sides. Let me explain.</p>
<p>Last November at <a href="http://opensqlcamp.org/Main_Page">OpenSQL Camp</a> I assembled a panel to debate &#8220;SQL vs NoSQL.&#8221; We had folks representing a variety of projects, including <a href="http://cassandra.apache.org/">Cassandra</a>, <a href="http://couchdb.apache.org/">CouchDB</a>, <a href="http://drizzle.org/">Drizzle</a>, <a href="http://askmonty.org/wiki/MariaDB">MariaDB</a>, <a href="http://www.mongodb.org/">MongoDB</a>, <a href="http://www.mysql.com/">MySQL</a>, and <a href="http://www.postgresql.org/">PostgreSQL</a>. Even though I realized this was a poor name for such a panel, I went with it anyways because this &#8220;debate&#8221; was really starting to heat up. The conclusion I was hoping for is that the two are not at odds because the two categories of projects can peacefully co-exist in the same toolbox for data management. Beyond the panel name, even the term <a href="http://en.wikipedia.org/wiki/NoSQL">&#8220;NoSQL&#8221;</a> is a bit misleading. I talked with <a href="http://blog.sym-link.com/">Eric Evans</a> (one of my <a href="http://oddments.org/?p=282">new co-workers</a> over on the Cassandra team) who reintroduced the term, and <a href="http://blog.sym-link.com/2009/10/30/nosql_whats_in_a_name.html">even he admits</a> it is vague and doesn&#8217;t do the projects categorized by it any favors. What happens when Cassandra has a SQL interface stacked on top of it? Yeah.</p>
<p>One reason for all this confusion is that for some people, the term &#8220;database&#8221; equates to &#8220;relational database.&#8221; This makes the non-relational projects look foreign because they don&#8217;t fit the database model that became &#8220;traditional&#8221; due it&#8217;s popularity. Anyone who has ever read up on other <a href="http://en.wikipedia.org/wiki/Database_model">database models</a> would quickly realize relational is just one of many models, and many of the &#8220;NoSQL&#8221; projects fit quite nicely into one of these categories. The real value these new projects are providing are in their implementation details, especially with dynamic scale-out (adding new nodes to live systems) and synchronization mechanisms (eventual consistency or tunable quorum). There are a lot of great ideas in these projects, and people on the &#8220;SQL&#8221; side should really take the time to study them &#8211; there are some tricks to learn.</p>
<p><img src="http://oddments.org/pics/squarepeg.png" alt="Square Peg, Round Hole" style="float:right" /></p>
<p>One of the main criticisms of the &#8220;NoSQL&#8221; projects is that they are taking a step back, simply reinventing a component that already exists in a relational model. While this may have some truth, if you gloss over the high-level logical data representations, this is just wrong. Sure, it may look like a simple key-value store from the outside, but there is a lot more under the hood. For many of these projects it was a design decision to focus on the implementation details where it matters, and not bother with things like parsing SQL and optimizing joins. I think there is still some value in supporting some form of a SQL interface because this gets you instant adoption by pretty much any developer out there. Love it or hate it, people know SQL. As for joins, scaling them with distributed relational nodes has been a research topic for years, and it&#8217;s a hard problem. People have worked around this by accepting new data models and consistency levels. <a href="http://www.xzilla.net/blog/2010/Mar/Actually,-the-Relational-Model-doesnt-scale.html">It all depends on what your problem requires</a>.</p>
<p>I fully embrace the &#8220;NoSQL&#8221; projects out there, there is something we can all learn from them even if we don&#8217;t put them into production. We should be thrilled we have more open source tools in our database toolbox, especially non-relational ones. We are no longer required to smash every dataset &#8220;peg&#8221; into the relational &#8220;hole.&#8221; Use the best tool for the job, this may still be a relational database. Explore your options, try to learn a few things, model your data in a number of ways, and find out what is really required. When it comes time to making a decision just remember:</p>
<p><center><a href="http://twitter.com/dlsspy/status/1652349607"><img alt="Dear everyone who is not Facebook: You are not Facebook." src="http://oddments.org/pics/dustin_facebook.jpg" /></a></center></p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24049&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24049&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:8:"Eric Day";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:24;a:6:{s:4:"data";s:38:"
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:72:"Some kernel tweaks to aid Cassandra under a high concurrency environment";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:59:"tag:blogger.com,1999:blog-31421954.post-4173777374413942729";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:77:"http://mysqldba.blogspot.com/2010/03/some-kernel-tweaks-to-aid-cassandra.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1773:"For the past couple of weeks I have been trouble shooting some Cassandra issues where data would not make it to Cassandra. The image above graphs all the exceptions that are produced from Cassandra. The two big lines areTransport Exceptions (te) - meaning that Cassandra could not answer the request think of this as MAX Connection errors in mySQL.Unavailable Exceptions (ue) - meaning that Cassandra could answer the request but the "storage engine" cannot do anything with it because its busy doing something like communicating with other nodes or maintenance like a node cleanup.So how did I get the graph to drop to 0? After looking at the error logs, I saw that Cassandra was getting flooded with SYN Requests and the kernel thought that it was a SYN Flood and did thispossible SYN flooding on port 9160. Sending cookies.To stop this the puppet profile was changed to havesysctl -w net.ipv4.tcp_max_syn_backlog=4096sysctl -w net.ipv4.tcp_syncookies=0Next looking into the Cassandra log which I defined to exist in /var/log/cassandra/system.logWARN [TCP Selector Manager] 2010-03-26 02:46:31,619 TcpConnectionHandler.java (line 53) Exception was generated at : 03/26/2010 02:Too many open filesjava.io.IOException: Too many open filesThen noticed that ulimit -n == 1024thus I changed/etc/security/limits.conf so that It's at a server setting by adding this:*                                         -              nofile                   8000Now my Transport Exceptions and Unavailable Exceptions are gone and data is being written to it consistently.There are many other ways of doing the same thing, I could have modified my init script or did some other stuff but I choose this way. Default Distros set kernel and limits fields too low: settings for desktop levels.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 26 Mar 2010 17:45:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2716:"For the past couple of weeks I have been trouble shooting some Cassandra issues where data would not make it to Cassandra.<br /><br /><a href="http://www.flickr.com/photos/dathan/4464575619/" title="Graph of various tracked Exceptions by dathan, on Flickr"><img src="http://farm5.static.flickr.com/4047/4464575619_c1f48240bd_b.jpg" width="1024" height="382" alt="Graph of various tracked Exceptions" /></a> <br /><br /><br />The image above graphs all the exceptions that are produced from Cassandra. The two big lines are<br /><br />Transport Exceptions (te) - meaning that Cassandra could not answer the request think of this as MAX Connection errors in mySQL.<br /><br /><br />Unavailable Exceptions (ue) - meaning that Cassandra could answer the request but the "storage engine" cannot do anything with it because its busy doing something like communicating with other nodes or maintenance like a node cleanup.<br /><br /><br />So how did I get the graph to drop to 0? After looking at the error logs, I saw that Cassandra was getting flooded with SYN Requests and the kernel thought that it was a SYN Flood and did this<br /><br /><i>possible SYN flooding on port 9160. Sending cookies.</i><br /><br /><br />To stop this the puppet profile was changed to have<br /><br />sysctl -w net.ipv4.tcp_max_syn_backlog=4096<br />sysctl -w net.ipv4.tcp_syncookies=0<br /><br /><br /><br />Next looking into the Cassandra log which I defined to exist in /var/log/cassandra/system.log<br /><br /><blockquote><br />WARN [TCP Selector Manager] 2010-03-26 02:46:31,619 TcpConnectionHandler.java (line 53) Exception was generated at : 03/26/2010 02:<br />Too many open files<br />java.io.IOException: Too many open files<br /></blockquote><br /><br />Then noticed that <br />ulimit -n == 1024<br /><br />thus I changed<br />/etc/security/limits.conf so that It's at a server setting by adding this:<br /><pre><br />*                                         -              nofile                   8000<br /></pre><br /><br />Now my Transport Exceptions and Unavailable Exceptions are gone and data is being written to it consistently.<br /><br />There are many other ways of doing the same thing, I could have modified my init script or did some other stuff but I choose this way. Default Distros set kernel and limits fields too low: settings for desktop levels.<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/31421954-4173777374413942729?l=mysqldba.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24047&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24047&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:17:"Dathan Pattishall";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:25;a:6:{s:4:"data";s:43:"
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:40:"More on the MySQL Audit Plugin interface";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:69:"tag:blogger.com,1999:blog-9144505959002328789.post-241486260456624319";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:89:"http://karlssonondatabases.blogspot.com/2010/03/more-on-mysql-audit-plugin-interface.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1268:"I will write some more on this interface eventually, following up my previous MySQL Audit API post, and will show some ideas and hopefully push some interesting code (I have ideas!). But note that the API so far isn't well documented (only source so far), but there is work underway to fix this by the friendly MySQL docs team.Already I have realized that Audit events are different than I thought. The source of the event is currentlyt either from inside the parser code or from the general log code. The events I got looked like general log events, so I just imaginged this was the source of what I saw, and I never relaized that there was another possible source, the parser. Actually, when the general log is not on, the parser events is all you get, but as I have shown, this is usually good enough. For the log events to be received, you still have to have the general log on. In practice, this doesn't seem to be much of a difference, but I'll keep an eye on it once the documentation is in place, and if there is a use for having Audit general log events, without having the general log per-se on, then I will create a worklog for that (along the lines of having the general_log variables have 3 values (ON, AUDIT_ONLY and OFF) or something like that./Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 26 Mar 2010 16:52:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:1:{i:0;a:5:{s:4:"data";s:15:"mysql audit api";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1761:"I will write some more on this interface eventually, following up my previous <a href="http://karlssonondatabases.blogspot.com/2010/03/mysql-audit-plugin-api.html">MySQL Audit API </a>post, and will show some ideas and hopefully push some interesting code (I have ideas!). But note that the API so far isn't well documented (only source so far), but there is work underway to fix this by the friendly MySQL docs team.<br />Already I have realized that Audit events are different than I thought. The source of the event is currentlyt either from inside the parser code or from the general log code. The events I got looked like general log events, so I just imaginged this was the source of what I saw, and I never relaized that there was another possible source, the parser. Actually, when the general log is not on, the parser events is all you get, but as I have shown, this is usually good enough. For the log events to be received, you still have to have the general log on. In practice, this doesn't seem to be much of a difference, but I'll keep an eye on it once the documentation is in place, and if there is a use for having Audit general log events, without having the general log per-se on, then I will create a worklog for that (along the lines of having the general_log variables have 3 values (ON, AUDIT_ONLY and OFF) or something like that.<br /><br />/Karlsson<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/9144505959002328789-241486260456624319?l=karlssonondatabases.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24046&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24046&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Anders Karlsson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:26;a:6:{s:4:"data";s:53:"
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:42:"How well do your tables fit in buffer pool";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:43:"http://www.mysqlperformanceblog.com/?p=2390";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:69:"http://www.mysqlperformanceblog.com/2010/03/26/tables-fit-buffer-poo/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2804:"In XtraDB we have the table INNODB_BUFFER_POOL_PAGES_INDEX which shows which pages belong to which indexes in which tables.  Using thing information and standard TABLES table we can see how well different tables fit in buffer pool.  
PLAIN TEXT
SQL:




mysql&gt; SELECT d.*,round&#40;100*cnt*16384/&#40;data_length+index_length&#41;,2&#41; fit FROM &#40;SELECT schema_name,table_name,count&#40;*&#41; cnt,sum&#40;dirty&#41;,sum&#40;hashed&#41;&nbsp; FROM INNODB_BUFFER_POOL_PAGES_INDEX GROUP BY schema_name,table_name ORDER BY cnt DESC LIMIT 20&#41; d JOIN TABLES ON &#40;TABLES.table_schema=d.schema_name AND TABLES.table_name=d.table_name&#41;;


+-------------+---------------------+---------+------------+-------------+--------+


| schema_name | table_name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | cnt&nbsp; &nbsp; &nbsp;| sum&#40;dirty&#41; | sum&#40;hashed&#41; | fit&nbsp; &nbsp; |


+-------------+---------------------+---------+------------+-------------+--------+


| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | 1699133 |&nbsp; &nbsp; &nbsp; 13296 |&nbsp; &nbsp; &nbsp; 385841 |&nbsp; 87.49 |


| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | 1173272 |&nbsp; &nbsp; &nbsp; 17399 |&nbsp; &nbsp; &nbsp; &nbsp;11099 |&nbsp; 98.42 |


| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; 916641 |&nbsp; &nbsp; &nbsp; &nbsp;7849 |&nbsp; &nbsp; &nbsp; &nbsp;15316 |&nbsp; 94.77 |


| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp;86999 |&nbsp; &nbsp; &nbsp; &nbsp;1555 |&nbsp; &nbsp; &nbsp; &nbsp;75554 |&nbsp; 87.42 |


| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp;32701 |&nbsp; &nbsp; &nbsp; &nbsp;7997 |&nbsp; &nbsp; &nbsp; &nbsp;30082 |&nbsp; 91.61 |


| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table6&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp;31990 |&nbsp; &nbsp; &nbsp; &nbsp;4495 |&nbsp; &nbsp; &nbsp; &nbsp;25681 | 102.97 |


| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table7&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; &nbsp;1 |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0 |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 | 100.00 |


+-------------+---------------------+---------+------------+-------------+--------+


7 rows IN SET &#40;26.45 sec&#41; 






You can also see in one of the cases the value shown is a bit over 100% - I am not sure where it comes from but more pages reported to belong to the table in buffer pool than on disk.  Though it seems to work well enough for estimation purposes.
    
    Entry posted by peter |
      One comment
    Add to:  |  |  |  | ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 26 Mar 2010 16:28:55 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:6:"Innodb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:6:"xtradb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:6365:"<p>In XtraDB we have the table INNODB_BUFFER_POOL_PAGES_INDEX which shows which pages belong to which indexes in which tables.  Using thing information and standard TABLES table we can see how well different tables fit in buffer pool.  </p>
<div><span><a href="http://www.mysqlperformanceblog.com">PLAIN TEXT</a></span></div>
<div><span>SQL:</span>
<div>
<div>
<ol>
<li>
<div>mysql&gt; <span>SELECT</span> d.*,round<span>&#40;</span><span>100</span>*cnt*<span>16384</span>/<span>&#40;</span>data_length+index_length<span>&#41;</span>,<span>2</span><span>&#41;</span> fit <span>FROM</span> <span>&#40;</span><span>SELECT</span> schema_name,table_name,count<span>&#40;</span>*<span>&#41;</span> cnt,sum<span>&#40;</span>dirty<span>&#41;</span>,sum<span>&#40;</span>hashed<span>&#41;</span>&nbsp; <span>FROM</span> INNODB_BUFFER_POOL_PAGES_INDEX <span>GROUP</span> <span>BY</span> schema_name,table_name <span>ORDER</span> <span>BY</span> cnt <span>DESC</span> <span>LIMIT</span> <span>20</span><span>&#41;</span> d <span>JOIN</span> <span>TABLES</span> <span>ON</span> <span>&#40;</span><span>TABLES</span>.table_schema=d.schema_name <span>AND</span> <span>TABLES</span>.table_name=d.table_name<span>&#41;</span>;</div>
</li>
<li>
<div>+<span>-------------+---------------------+---------+------------+-------------+--------+</span></div>
</li>
<li>
<div>| schema_name | table_name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | cnt&nbsp; &nbsp; &nbsp;| sum<span>&#40;</span>dirty<span>&#41;</span> | sum<span>&#40;</span>hashed<span>&#41;</span> | fit&nbsp; &nbsp; |</div>
</li>
<li>
<div>+<span>-------------+---------------------+---------+------------+-------------+--------+</span></div>
</li>
<li>
<div>| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | <span>1699133</span> |&nbsp; &nbsp; &nbsp; <span>13296</span> |&nbsp; &nbsp; &nbsp; <span>385841</span> |&nbsp; <span>87</span>.<span>49</span> |</div>
</li>
<li>
<div>| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | <span>1173272</span> |&nbsp; &nbsp; &nbsp; <span>17399</span> |&nbsp; &nbsp; &nbsp; &nbsp;<span>11099</span> |&nbsp; <span>98</span>.<span>42</span> |</div>
</li>
<li>
<div>| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; <span>916641</span> |&nbsp; &nbsp; &nbsp; &nbsp;<span>7849</span> |&nbsp; &nbsp; &nbsp; &nbsp;<span>15316</span> |&nbsp; <span>94</span>.<span>77</span> |</div>
</li>
<li>
<div>| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp;<span>86999</span> |&nbsp; &nbsp; &nbsp; &nbsp;<span>1555</span> |&nbsp; &nbsp; &nbsp; &nbsp;<span>75554</span> |&nbsp; <span>87</span>.<span>42</span> |</div>
</li>
<li>
<div>| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp;<span>32701</span> |&nbsp; &nbsp; &nbsp; &nbsp;<span>7997</span> |&nbsp; &nbsp; &nbsp; &nbsp;<span>30082</span> |&nbsp; <span>91</span>.<span>61</span> |</div>
</li>
<li>
<div>| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table6&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp;<span>31990</span> |&nbsp; &nbsp; &nbsp; &nbsp;<span>4495</span> |&nbsp; &nbsp; &nbsp; &nbsp;<span>25681</span> | <span>102</span>.<span>97</span> |</div>
</li>
<li>
<div>| db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | table7&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; &nbsp;<span>1</span> |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span>0</span> |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span>0</span> | <span>100</span>.<span>00</span> |</div>
</li>
<li>
<div>+<span>-------------+---------------------+---------+------------+-------------+--------+</span></div>
</li>
<li>
<div><span>7</span> rows <span>IN</span> <span>SET</span> <span>&#40;</span><span>26</span>.<span>45</span> sec<span>&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>You can also see in one of the cases the value shown is a bit over 100% - I am not sure where it comes from but more pages reported to belong to the table in buffer pool than on disk.  Though it seems to work well enough for estimation purposes.</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by peter |
      <a href="http://www.mysqlperformanceblog.com/2010/03/26/tables-fit-buffer-poo/#comments">One comment</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2010/03/26/tables-fit-buffer-poo/&amp;title=How%20well%20do%20your%20tables%20fit%20in%20buffer%20pool" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2010/03/26/tables-fit-buffer-poo/&amp;title=How%20well%20do%20your%20tables%20fit%20in%20buffer%20pool" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2010/03/26/tables-fit-buffer-poo/&amp;title=How%20well%20do%20your%20tables%20fit%20in%20buffer%20pool" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2010/03/26/tables-fit-buffer-poo/&amp;T=How%20well%20do%20your%20tables%20fit%20in%20buffer%20pool" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2010/03/26/tables-fit-buffer-poo/&amp;title=How%20well%20do%20your%20tables%20fit%20in%20buffer%20pool" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24053&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24053&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:22:"MySQL Performance Blog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:27;a:6:{s:4:"data";s:53:"
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:34:"Holy Google Summer of Code, Batman";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:28:"http://www.joinfu.com/?p=356";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:64:"http://www.joinfu.com/2010/03/holy-google-summer-of-code-batman/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1745:"So, last year, Drizzle participated in the Google Summer of Code under the MySQL project organization.  We had four excellent student submissions and myself, Monty Taylor, Eric Day and Stewart Smith all mentored students for the summer.  It was my second year mentoring, and I really enjoyed it, so I was looking forward to this year&#8217;s summer of code.
This year, Padraig O&#8217;Sullivan, a GSoC student last year, is now working at Akiban Technologies, partly on Drizzle, and is the GSoC Adminsitrator and also a mentor for Drizzle this year, and Drizzle is its own sponsored project organization this year.  Thank you, Padraig!
I have been absolutely floored by the flood of potential students who have shown up on the mailing list and the #drizzle IRC channel.  I have been even more impressed with those students&#8217; ambition, sense of community, and willingness to ask questions and help other students as they show up.  A couple students have even gotten code contributed to the source trees even before submitting their official applications to GSoC.  See, I told you they were ambitious!  
This year, Drizzle has a listing of 16 potential projects for students to work on.  The projects are for students interested in developing in C++, Python, or Perl.
If you are interested in participating, please do check out Drizzle!  For those new to Launchpad, Bazaar, and C++ development with Drizzle, feel free to check out these blog articles which cover those topics:

A Contributor&#8217;s Guide to Launchpad and Bazaar &#8211; Part 1 &#8211; Getting Started
A Contributor&#8217;s Guide to Launchpad and Bazaar &#8211; Part 2 &#8211; Code Management
Getting a C++ Development Enviroment Established

And, in other news, Go Buckeyes!";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 26 Mar 2010 16:06:02 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:5:"C/C++";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:7:"Drizzle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3030:"<p>So, last year, <a href="http://launchpad.net/drizzle">Drizzle</a> <a href="http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2009">participated</a> in the Google Summer of Code under the MySQL project organization.  We had four excellent student submissions and myself, <a href="http://inaugust.com">Monty Taylor</a>, <a href="http://oddments.org">Eric Day</a> and <a href="http://flamingspork.com">Stewart Smith</a> all mentored students for the summer.  It was my second year mentoring, and I really enjoyed it, so I was looking forward to this year&#8217;s summer of code.</p>
<p>This year, <a href="http://posulliv.com">Padraig O&#8217;Sullivan</a>, a GSoC student last year, is now working at <a href="http://www.akiban.com/">Akiban Technologies</a>, partly on Drizzle, and is the GSoC Adminsitrator and also a mentor for Drizzle this year, and <em>Drizzle is its own <a href="http://socghop.appspot.com/gsoc/program/accepted_orgs/google/gsoc2010">sponsored project organization</a> this year</em>.  Thank you, Padraig!</p>
<p>I have been absolutely floored by the flood of potential students who have shown up on the <a href="https://lists.launchpad.net/drizzle-discuss/">mailing list</a> and the #drizzle IRC channel.  I have been even more impressed with those students&#8217; ambition, sense of community, and willingness to ask questions and help other students as they show up.  A couple students have even gotten code contributed to the source trees even before submitting their official applications to GSoC.  See, I told you they were ambitious! <img src="http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /> </p>
<p>This year, Drizzle has a <a href="http://drizzle.org/wiki/Soc">listing of 16 potential projects</a> for students to work on.  The projects are for students interested in developing in C++, Python, or Perl.</p>
<p>If you are interested in participating, please do check out Drizzle!  For those new to Launchpad, Bazaar, and C++ development with Drizzle, feel free to check out these blog articles which cover those topics:</p>
<ul>
<li><a href="http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-1-getting-started/">A Contributor&#8217;s Guide to Launchpad and Bazaar &#8211; Part 1 &#8211; Getting Started</a></li>
<li><a href="http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-2-code-management/">A Contributor&#8217;s Guide to Launchpad and Bazaar &#8211; Part 2 &#8211; Code Management</a></li>
<li><a href="http://www.joinfu.com/2008/08/getting-a-working-c-c-plusplus-development-environment-for-developing-drizzle/">Getting a C++ Development Enviroment Established</a></li>
</ul>
<p>And, in other news, <a href="http://www.ncaa.com/brackets/basketball/men/">Go Buckeyes</a>!</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24044&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24044&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:9:"Jay Pipes";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:28;a:6:{s:4:"data";s:48:"
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:29:"How do we measure innovation?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:37:"tag:radar.oreilly.com,2010://57.39462";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:98:"http://feedproxy.google.com/~r/oreilly/radar/atom/~3/hsV291jAvqU/how-do-we-measure-innovation.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1976:"In response to the IEEE's report on Patent Power, which lists the top companies ranked by number of patents, Ari Shahdadi and Brad Burnham made trenchant comments in email that I thought were worth sharing (with their permission):

Ari wrote: 

The main article is sad to read, with choice quotes like this:  "Clearly, the global recession seriously hampered innovation in the United States."  If I'd like to do anything, it's end the use of patenting statistics as a metric for innovative activity, especially by groups like the IEEE.

Brad responded: 

Amen - R&D spending is also a bad indicator because so much is wasted in big companies. The methodology should have something to do with end user utility. Facebook has had a bigger impact on more lives than IBM and they don&#8217;t spend a fraction of what IBM spends on R&D or on patents.

I totally agree with both Ari and Brad, but just wishing that people would use another metric won't make it happen. How might we construct a metric that would reflect the transformative power of the web (no patents), Google (nowhere near as many as their innovations), Facebook (ditto), Amazon (ditto, despite the 1-click flap), Craigslist, Wikipedia, not to mention free software such as Linux, Apache, MySQL and friends, as well the upwelling of innovation in media, maker culture, robotics... you name it:  all the areas where small companies create new value and don't have time, money or inclination to divert effort from innovation to patents?

I've long been mindful of the power of synthetic indexes.  How many people who religiously check the Dow or the Nasdaq know which companies it actually represents?

It seems to me that there ought to be a way to measure the introduction of new products, and rank them by novelty and by widespread acceptance, in some way that reflects a more substantial measure of innovation and its impact on the economy. 

I'd love your thoughts about what could go into such a measure.


   
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 26 Mar 2010 15:44:03 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:2:{i:0;a:5:{s:4:"data";s:10:"innovation";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:7:"patents";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3569:"<p>In response to the IEEE's report on <a href="http://spectrum.ieee.org/static/patentpower2010">Patent Power</a>, which lists the top companies ranked by number of patents, <a href="http://www.google.com/profiles/111617526519871515953#buzz">Ari Shahdadi</a> and <a href="http://twitter.com/bradusv">Brad Burnham</a> made trenchant comments in email that I thought were worth sharing (with their permission):<br />
<p><br />
Ari wrote: <br />
<blockquote><br />
The <a href="http://spectrum.ieee.org/at-work/innovation/patent-power-scorecards-japan-ascendant">main article</a> is sad to read, with choice quotes like this:  "Clearly, the global recession seriously hampered innovation in the United States."  If I'd like to do anything, it's end the use of patenting statistics as a metric for innovative activity, especially by groups like the IEEE.<br />
</blockquote><br />
Brad responded: <br />
<blockquote><br />
Amen - R&D spending is also a bad indicator because so much is wasted in big companies. The methodology should have something to do with end user utility. Facebook has had a bigger impact on more lives than IBM and they don&#8217;t spend a fraction of what IBM spends on R&D or on patents.<br />
</blockquote><br />
I totally agree with both Ari and Brad, but just wishing that people would use another metric won't make it happen. How might we construct a metric that would reflect the transformative power of the web (no patents), Google (nowhere near as many as their innovations), Facebook (ditto), Amazon (ditto, despite the 1-click flap), Craigslist, Wikipedia, not to mention free software such as Linux, Apache, MySQL and friends, as well the upwelling of innovation in media, maker culture, robotics... you name it:  all the areas where small companies create new value and don't have time, money or inclination to divert effort from innovation to patents?<br />
<p><br />
I've long been mindful of the power of synthetic indexes.  How many people who religiously check the Dow or the Nasdaq know which companies it actually represents?<br />
<p><br />
It seems to me that there ought to be a way to measure the introduction of new products, and rank them by novelty and by widespread acceptance, in some way that reflects a more substantial measure of innovation and its impact on the economy. <br />
<p><br />
I'd love your thoughts about what could go into such a measure.</p>

<div>
<a href="http://feeds.feedburner.com/~ff/oreilly/radar/atom?a=hsV291jAvqU:UNgjCpIlVkI:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/oreilly/radar/atom?i=hsV291jAvqU:UNgjCpIlVkI:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/oreilly/radar/atom?a=hsV291jAvqU:UNgjCpIlVkI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/oreilly/radar/atom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/oreilly/radar/atom?a=hsV291jAvqU:UNgjCpIlVkI:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/oreilly/radar/atom?i=hsV291jAvqU:UNgjCpIlVkI:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/oreilly/radar/atom?a=hsV291jAvqU:UNgjCpIlVkI:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/oreilly/radar/atom?d=7Q72WNTAKBA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/hsV291jAvqU" height="1" width="1" /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24045&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24045&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:12:"Tim O'Reilly";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:29;a:6:{s:4:"data";s:53:"
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:43:"MONyog MySQL Monitor 3.73 Has Been Released";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:34:"http://www.webyog.com/blog/?p=1652";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:82:"http://www.webyog.com/blog/2010/03/26/monyog-mysql-monitor-3-73-has-been-released/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:559:" Changes (as compared to 3.72) include:
* If SHOW ENGINE INNODB STATUS returned an error that was not privilege-related, MONyog reported MySQL as non-available.  That could happen for instance if MySQL was started with &#8211;skip-innodb option. This bug was introduced in 3.71 with the support for InnoDB deadlock detection.
* A bug in the MONyog startup script could on Linux have the result that MONyog was still reported as running if it had been killed or had crashed.
Downloads: http://webyog.com/en/downloads.php
Purchase: http://webyog.com/en/buy.php";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 26 Mar 2010 09:06:07 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:6:"MONyog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:8:"Releases";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:954:"<p><strong> Changes (as compared to 3.72) include:</strong></p>
<p>* If SHOW ENGINE INNODB STATUS returned an error that was not privilege-related, MONyog reported MySQL as non-available.  That could happen for instance if MySQL was started with &#8211;skip-innodb option. This bug was introduced in 3.71 with the support for InnoDB deadlock detection.<br />
* A bug in the MONyog startup script could on Linux have the result that MONyog was still reported as running if it had been killed or had crashed.</p>
<p><strong>Downloads:</strong> <a href="http://webyog.com/en/downloads.php">http://webyog.com/en/downloads.php</a><br />
<strong>Purchase:</strong> <a href="http://webyog.com/en/downloads.php">http://webyog.com/en/buy.php</a></p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24043&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24043&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:6:"Webyog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:30;a:6:{s:4:"data";s:48:"
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:20:"Pay now or pay later";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-5915567578707286635.post-2485659573300814766";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:61:"http://mysqlha.blogspot.com/2010/03/pay-now-or-pay-later.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:951:"I think I have rpl_transaction_enabled working for MySQL 5.1 and will publish the patch after more testing. I hope to never port this again but that depends on whether the distribution I use provides an equivalent feature. Apparently people in operations enjoy not having to restore slaves after hardware and software crashes.Some  features require payment up front. They either cost a lot for developers to implement or for users to deploy. Others avoid the up front costs but  require payment down the road by users who encounter many problems. I think that MySQL replication has been on the wrong side of this trade off for too long. But things are changing as the replication team has been done a lot of good things for the past few years. I am sure if we follow Mats around at the User conference we can find out what is coming.MySQL has to improve to remain competitive as PostgreSQL and others have compelling features pending or available now.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Fri, 26 Mar 2010 04:17:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:2:{i:0;a:5:{s:4:"data";s:4:"rant";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1614:"I think I have <a href="http://code.google.com/p/google-mysql-tools/wiki/TransactionalReplication">rpl_transaction_enabled</a> working for MySQL 5.1 and will publish the patch after more testing. I hope to never port this again but that depends on whether the distribution I use provides an equivalent feature. Apparently people in operations enjoy not having to restore slaves after hardware and software crashes.<br /><br />Some  features require payment up front. They either cost a lot for developers to implement or for users to deploy. Others avoid the up front costs but  require payment down the road by users who encounter many problems. I think that MySQL replication has been on the wrong side of this trade off for too long. But things are changing as the replication team has been done a lot of good things for the past few years. I am sure if we <a href="http://en.oreilly.com/mysql2010/public/schedule/speaker/3122">follow Mats around</a> at the User conference we can find out what is coming.<br /><br />MySQL has to improve to remain competitive <a href="http://scale-out-blog.blogspot.com/2009/02/simple-ha-with-postgresql-point-in-time.html">as PostgreSQL</a> and others have compelling features pending or available now.<div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/5915567578707286635-2485659573300814766?l=mysqlha.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24042&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24042&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Mark Callaghan";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:31;a:6:{s:4:"data";s:58:"
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:22:"Announcing TokuDB v3.1";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:26:"http://tokutek.com/?p=1246";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:50:"http://tokutek.com/2010/03/announcing-tokudb-v3-1/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1131:"Tokutek is pleased to announce immediate availability of TokuDB for MySQL, version 3.1. It is designed for continuous querying and analysis of large volumes of rapidly arriving and changing data, while maintaining full ACID properties.
TokuDB v3.1&#8217;s new functionality includes:

Improved handling of a full disk
Configurable disk space reserve
Faster group commits
Faster crash recovery
Improved SHOW ENGINE STATUS and SHOW PROCESSLIST diagnostics

This new release builds on TokuDB&#8217;s core benefits:

10x-50x faster indexing for faster querying
Full support for ACID transactions
Short recovery time (seconds or minutes, not hours or days)
Immunity to database aging to eliminate performance degradation and maintenance headaches
5x-15x data compression for reduced disk use and lower storage costs

Because of its high indexing performance and transaction support, TokuDB is well suited to Web applications that must simultaneously store and query large volumes of rapidly arriving data, including:

Social Networking
eCommerce Personalization
Logfile Analysis
High-speed Webcrawling
Real-time clickstream analysis
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 21:03:23 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:8:"TokuView";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:12:"Announcement";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:6:"TokuDB";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1597:"<p>Tokutek is pleased to announce immediate availability of <a href="http://tokutek.com/products/tokudb-for-mysql-v3/">TokuDB for MySQL, version 3.1</a>. It is designed for continuous querying and analysis of large volumes of rapidly arriving and changing data, while maintaining full ACID properties.</p>
<p>TokuDB v3.1&#8217;s new functionality includes:</p>
<ul>
<li>Improved handling of a full disk</li>
<li>Configurable disk space reserve</li>
<li>Faster group commits</li>
<li>Faster crash recovery</li>
<li>Improved SHOW ENGINE STATUS and SHOW PROCESSLIST diagnostics</li>
</ul>
<p>This new release builds on TokuDB&#8217;s core benefits:</p>
<ul>
<li>10x-50x faster indexing for faster querying</li>
<li>Full support for ACID transactions</li>
<li>Short recovery time (seconds or minutes, not hours or days)</li>
<li>Immunity to database aging to eliminate performance degradation and maintenance headaches</li>
<li>5x-15x data compression for reduced disk use and lower storage costs</li>
</ul>
<p>Because of its high indexing performance and transaction support, TokuDB is well suited to Web applications that must simultaneously store and query large volumes of rapidly arriving data, including:</p>
<ul>
<li>Social Networking</li>
<li>eCommerce Personalization</li>
<li>Logfile Analysis</li>
<li>High-speed Webcrawling</li>
<li>Real-time clickstream analysis</li>
</ul><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24040&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24040&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Tokuview Blog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:32;a:6:{s:4:"data";s:43:"
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:23:"Bayesian classification";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:34:"http://explainextended.com/?p=4598";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:62:"http://explainextended.com/2010/03/25/bayesian-classification/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:7569:"From Stack Overflow:
Suppose you&#8217;ve visited sites S0 … S50. All except S0 are 48% female; S0 is 100% male.
I&#8217;m guessing your gender, and I want to have a value close to 100%, not just the 49% that a straight average would give.
Also, consider that most demographics (i.e. everything other than gender) does not have the average at 50%. For example, the average probability of having kids 0-17 is ~37%.
The more a given site&#8217;s demographics are different from this average (e.g. maybe it&#8217;s a site for parents, or for child-free people), the more it should count in my guess of your status.
What&#8217;s the best way to calculate this?
This is a classical application of Bayes&#8217; Theorem.
The formula to calculate the posterior probability is:
P(A|B) = P(B|A) &times; P(A) / P(B) = P(B|A) &times; P(A) / (P(B|A) &times; P(A) + P(B|A*) &times; P(A*))
, where:

P(A|B) is the posterior probability of the visitor being a male (given that he visited the site)
P(A) is the prior probability of the visitor being a male (initially, 50%)
P(B) is the probability of (any Internet user) visiting the site
P(B|A) is the probability of a user visiting the site, given that he is a male
P(A*) is the prior probability of the visitor not being a male (initially, 50%)
P(B|A*) is the probability of a user visiting the site, given that she is not a male.


Since a user can only be male or female:
P(A|B) = P(B|A)&times;P(A)/P(B) = P(B|A)&times;P(A) / (P(B|A)&times;P(A) + (1 - P(B|A))&times;(1 - P(A)))
P(B|A) is the number stored in the database (probability of the user being a male).
We consider the events of visiting the different sites to be independent (a fact that the user visited site A neither influences nor is influenced by the fact that the user also visited site B. This is of course not so, since the sites may exchange links etc., but we make this assumption for the sake of simplicity.
So, given a series of the sites, we take the initial probability (P0 = 0.5) and recursively substitute it into the following formula:
Pn = Sn&times;Pn-1 / (Sn&times;Pn-1 + (1 - Sn)&times;(1 - Pn-1))
Simple calculations show us the the recursion (which MySQL is not good at) can be replaced with an aggregate formula:
P = P0 * PROD(S) / (P0 * PROD(S) + (1 - P0) * PROD(1 - S))
, where PROD is the aggregate product of the sites&#8217; probabilities of their visitors being a male.
SQL does not have a built-in aggregate product function, but it can be easily replaced by the aggregate sum on the logarithmic scale:
P = P0 * EXP(SUM(LN(S))) / (P0 * EXP(SUM(LN(S))) + (1 - P0) * EXP(SUM(LN(1 - S))))
Given all that, let&#8217;s create some sample tables:
Table creation details



CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE t_user (
        id INT NOT NULL PRIMARY KEY,
        name VARCHAR(20) NOT NULL,
        gender CHAR(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

CREATE TABLE t_site (
        id INT NOT NULL PRIMARY KEY,
        name VARCHAR(20) NOT NULL,
        male DOUBLE NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

CREATE TABLE t_visit (
        u_id INT NOT NULL,
        s_id INT NOT NULL,
        PRIMARY KEY (u_id, s_id)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt &lt;= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(1000);
COMMIT;

INSERT
INTO    t_user
SELECT  id, CONCAT(&#039;User &#039;, id),
        CASE WHEN RAND(20100325) &gt; 0.5 THEN &#039;M&#039; ELSE &#039;F&#039; END
FROM    filler;

INSERT
INTO    t_site
SELECT  id, CONCAT(&#039;Site &#039;, id),
        RAND(20100325 &lt;&lt; 1) * 0.94 + 0.03
FROM    filler;

INSERT
INTO    t_visit
SELECT  u_id, s_id
FROM    (
        SELECT  u.id AS u_id, s.id AS s_id, u.gender, s.male,
                RAND(20100325 &lt;&lt; 2) AS rnds,
                RAND(20100325 &lt;&lt; 3) AS rndm
        FROM    t_user u
        CROSS JOIN
                t_site s
        ) q
WHERE   rnds &lt; 0.05
        AND rndm &lt; CASE gender WHEN &#039;M&#039; THEN male ELSE 1 - male END;


There are 1,000 users and 1,000 sites. The sites are assigned with maleness from 0.03 to 0.97.
User randomly visit the sites according to their gender and the site gender distribution. There are 25 visits per user in average.
Let&#8217;s try to guess the users&#8217; gender and return only wrong guesses.
We will assume that the user is male when the posterior probability of the user being male is more than 0.99, female if that is less than 0.01, undefined if within 0.01 and 0.99:

SELECT  *, CASE WHEN posterior &lt; 0.01 THEN &#039;F&#039; WHEN posterior &gt; 0.99 THEN &#039;M&#039; ELSE &#039;U&#039; END AS guessed
FROM    (
        SELECT  u.*,
                prior * EXP(SUM(LN(male))) / (prior * EXP(SUM(LN(male))) + (1 - prior) * EXP(SUM(LN(1 - male)))) AS posterior
        FROM    (
                SELECT  0.5 AS prior
                ) vars
        CROSS JOIN
                t_user u
        LEFT JOIN
                t_visit v
        ON      v.u_id = u.id
        LEFT JOIN
                t_site s
        ON      s.id = v.s_id
        GROUP BY
                u.id
        ) q
HAVING  guessed &lt;&gt; gender




id
name
gender
posterior
guessed


51
User 51
F
0.652234131074669
U


53
User 53
F
0.87625067361204
U


94
User 94
M
0.732238662361337
U


264
User 264
F
0.0520209347475727
U


475
User 475
M
0.974230285094509
U


497
User 497
M
0.966568719694869
U


542
User 542
F
0.0685609699288645
U


595
User 595
M
0.984478426560255
U


742
User 742
F
0.0334681988009631
U


768
User 768
M
0.960799229888108
U


800
User 800
F
0.0181411777994256
U


867
User 867
F
0.0401728770664721
U


882
User 882
M
0.884671868426923
U


902
User 902
F
0.802525467489821
U


14 rows fetched in 0.0006s (0.1989s)






id
select_type
table
type
possible_keys
key
key_len
ref
rows
filtered
Extra


1
PRIMARY
&lt;derived2&gt;
ALL




1000
100.00



2
DERIVED
&lt;derived3&gt;
system




1
100.00
Using temporary; Using filesort


2
DERIVED
u
ALL




871
100.00



2
DERIVED
v
ref
PRIMARY
PRIMARY
4
20100325_bayes.u.id
12
100.00
Using index


2
DERIVED
s
eq_ref
PRIMARY
PRIMARY
4
20100325_bayes.v.s_id
1
100.00



3
DERIVED








No tables used




select `q`.`id` AS `id`,`q`.`name` AS `name`,`q`.`gender` AS `gender`,`q`.`posterior` AS `posterior`,(case when (`q`.`posterior` &lt; 0.01) then &#39;F&#39; when (`q`.`posterior` &gt; 0.99) then &#39;M&#39; else &#39;U&#39; end) AS `guessed` from (select `20100325_bayes`.`u`.`id` AS `id`,`20100325_bayes`.`u`.`name` AS `name`,`20100325_bayes`.`u`.`gender` AS `gender`,((&#39;0.5&#39; * exp(sum(ln(`20100325_bayes`.`s`.`male`)))) / ((&#39;0.5&#39; * exp(sum(ln(`20100325_bayes`.`s`.`male`)))) + ((1 - &#39;0.5&#39;) * exp(sum(ln((1 - `20100325_bayes`.`s`.`male`))))))) AS `posterior` from (select 0.5 AS `prior`) `vars` join `20100325_bayes`.`t_user` `u` left join `20100325_bayes`.`t_visit` `v` on((`20100325_bayes`.`v`.`u_id` = `20100325_bayes`.`u`.`id`)) left join `20100325_bayes`.`t_site` `s` on((`20100325_bayes`.`s`.`id` = `20100325_bayes`.`v`.`s_id`)) where 1 group by `20100325_bayes`.`u`.`id`) `q` having (convert(`guessed` using utf8) &lt;&gt; `q`.`gender`)

From 1,000 users, we only have 14 results outside the credible interval of 99% and all of these are undefined rather than false.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 20:00:58 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:1:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:10845:"<p>From <a href="http://stackoverflow.com/questions/2448522/mysql-stats-weighting-an-average-to-accentuate-differences-from-the-mean"><strong>Stack Overflow</strong></a>:</p>
<blockquote><p>Suppose you&#8217;ve visited sites <strong>S0 … S50</strong>. All except <strong>S0</strong> are <strong>48%</strong> female; <strong>S0</strong> is <strong>100%</strong> male.</p>
<p>I&#8217;m guessing your gender, and I want to have a value close to <strong>100%</strong>, not just the <strong>49%</strong> that a straight average would give.</p>
<p>Also, consider that most demographics (i.e. everything other than gender) does not have the average at <strong>50%</strong>. For example, the average probability of having kids <strong>0-17</strong> is <strong>~37%</strong>.</p>
<p>The more a given site&#8217;s demographics are different from this average (e.g. maybe it&#8217;s a site for parents, or for child-free people), the more it should count in my guess of your status.</p>
<p>What&#8217;s the best way to calculate this?</p></blockquote>
<p>This is a classical application of <a href="http://en.wikipedia.org/wiki/Bayes'_theorem">Bayes&#8217; Theorem</a>.</p>
<p>The formula to calculate the posterior probability is:</p>
<p><code>P(A|B) = P(B|A) &times; P(A) / P(B) = P(B|A) &times; P(A) / (P(B|A) &times; P(A) + P(B|A<sup>*</sup>) &times; P(A<sup>*</sup>))</code></p>
<p>, where:</p>
<ul>
<li><code>P(A|B)</code> is the posterior probability of the visitor being a male (given that he visited the site)</li>
<li><code>P(A)</code> is the prior probability of the visitor being a male (initially, <strong>50%</strong>)</li>
<li><code>P(B)</code> is the probability of (any Internet user) visiting the site</li>
<li><code>P(B|A)</code> is the probability of a user visiting the site, given that he is a male</li>
<li><code>P(A<sup>*</sup>)</code> is the prior probability of the visitor not being a male (initially, <strong>50%</strong>)</li>
<li><code>P(B|A<sup>*</sup>)</code> is the probability of a user visiting the site, given that she is not a male.</li>
</ul>
<p><span></span><br />
Since a user can only be male or female:</p>
<p><code>P(A|B) = P(B|A)&times;P(A)/P(B) = P(B|A)&times;P(A) / (P(B|A)&times;P(A) + (1 - P(B|A))&times;(1 - P(A)))</code></p>
<p><code>P(B|A)</code> is the number stored in the database (probability of the user being a male).</p>
<p>We consider the events of visiting the different sites to be independent (a fact that the user visited site <strong>A</strong> neither influences nor is influenced by the fact that the user also visited site <strong>B</strong>. This is of course not so, since the sites may exchange links etc., but we make this assumption for the sake of simplicity.</p>
<p>So, given a series of the sites, we take the initial probability (<code>P<sub>0</sub> = 0.5</code>) and recursively substitute it into the following formula:</p>
<p><code>P<sub>n</sub> = S<sub>n</sub>&times;P<sub>n-1</sub> / (S<sub>n</sub>&times;P<sub>n-1</sub> + (1 - S<sub>n</sub>)&times;(1 - P<sub>n-1</sub>))</code></p>
<p>Simple calculations show us the the recursion (which <strong>MySQL</strong> is not good at) can be replaced with an aggregate formula:</p>
<p><code>P = P0 * PROD(S) / (P0 * PROD(S) + (1 - P0) * PROD(1 - S))</code></p>
<p>, where <code>PROD</code> is the aggregate product of the sites&#8217; probabilities of their visitors being a male.</p>
<p><strong>SQL</strong> does not have a built-in aggregate product function, but it can be easily replaced by the aggregate sum on the logarithmic scale:</p>
<p><code>P = P0 * EXP(SUM(LN(S))) / (P0 * EXP(SUM(LN(S))) + (1 - P0) * EXP(SUM(LN(1 - S))))</code></p>
<p>Given all that, let&#8217;s create some sample tables:</p>
<p><a href="http://explainextended.com"><strong>Table creation details</strong></a><br />
</p>
<div>
<pre>
CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE t_user (
        id INT NOT NULL PRIMARY KEY,
        name VARCHAR(20) NOT NULL,
        gender CHAR(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

CREATE TABLE t_site (
        id INT NOT NULL PRIMARY KEY,
        name VARCHAR(20) NOT NULL,
        male DOUBLE NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

CREATE TABLE t_visit (
        u_id INT NOT NULL,
        s_id INT NOT NULL,
        PRIMARY KEY (u_id, s_id)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt &lt;= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(1000);
COMMIT;

INSERT
INTO    t_user
SELECT  id, CONCAT(&#039;User &#039;, id),
        CASE WHEN RAND(20100325) &gt; 0.5 THEN &#039;M&#039; ELSE &#039;F&#039; END
FROM    filler;

INSERT
INTO    t_site
SELECT  id, CONCAT(&#039;Site &#039;, id),
        RAND(20100325 &lt;&lt; 1) * 0.94 + 0.03
FROM    filler;

INSERT
INTO    t_visit
SELECT  u_id, s_id
FROM    (
        SELECT  u.id AS u_id, s.id AS s_id, u.gender, s.male,
                RAND(20100325 &lt;&lt; 2) AS rnds,
                RAND(20100325 &lt;&lt; 3) AS rndm
        FROM    t_user u
        CROSS JOIN
                t_site s
        ) q
WHERE   rnds &lt; 0.05
        AND rndm &lt; CASE gender WHEN &#039;M&#039; THEN male ELSE 1 - male END;
</pre>
</div>
<p>There are <strong>1,000</strong> users and <strong>1,000</strong> sites. The sites are assigned with <q>maleness</q> from <strong>0.03</strong> to <strong>0.97</strong>.</p>
<p>User randomly visit the sites according to their gender and the site gender distribution. There are <strong>25</strong> visits per user in average.</p>
<p>Let&#8217;s try to guess the users&#8217; gender and return only wrong guesses.</p>
<p>We will assume that the user is male when the posterior probability of the user being male is more than <strong>0.99</strong>, female if that is less than <strong>0.01</strong>, undefined if within <strong>0.01</strong> and <strong>0.99</strong>:</p>
<pre>
SELECT  *, CASE WHEN posterior &lt; 0.01 THEN &#039;F&#039; WHEN posterior &gt; 0.99 THEN &#039;M&#039; ELSE &#039;U&#039; END AS guessed
FROM    (
        SELECT  u.*,
                prior * EXP(SUM(LN(male))) / (prior * EXP(SUM(LN(male))) + (1 - prior) * EXP(SUM(LN(1 - male)))) AS posterior
        FROM    (
                SELECT  0.5 AS prior
                ) vars
        CROSS JOIN
                t_user u
        LEFT JOIN
                t_visit v
        ON      v.u_id = u.id
        LEFT JOIN
                t_site s
        ON      s.id = v.s_id
        GROUP BY
                u.id
        ) q
HAVING  guessed &lt;&gt; gender
</pre>
<div>
<table>
<tr>
<th>id</th>
<th>name</th>
<th>gender</th>
<th>posterior</th>
<th>guessed</th>
</tr>
<tr>
<td>51</td>
<td>User 51</td>
<td>F</td>
<td>0.652234131074669</td>
<td>U</td>
</tr>
<tr>
<td>53</td>
<td>User 53</td>
<td>F</td>
<td>0.87625067361204</td>
<td>U</td>
</tr>
<tr>
<td>94</td>
<td>User 94</td>
<td>M</td>
<td>0.732238662361337</td>
<td>U</td>
</tr>
<tr>
<td>264</td>
<td>User 264</td>
<td>F</td>
<td>0.0520209347475727</td>
<td>U</td>
</tr>
<tr>
<td>475</td>
<td>User 475</td>
<td>M</td>
<td>0.974230285094509</td>
<td>U</td>
</tr>
<tr>
<td>497</td>
<td>User 497</td>
<td>M</td>
<td>0.966568719694869</td>
<td>U</td>
</tr>
<tr>
<td>542</td>
<td>User 542</td>
<td>F</td>
<td>0.0685609699288645</td>
<td>U</td>
</tr>
<tr>
<td>595</td>
<td>User 595</td>
<td>M</td>
<td>0.984478426560255</td>
<td>U</td>
</tr>
<tr>
<td>742</td>
<td>User 742</td>
<td>F</td>
<td>0.0334681988009631</td>
<td>U</td>
</tr>
<tr>
<td>768</td>
<td>User 768</td>
<td>M</td>
<td>0.960799229888108</td>
<td>U</td>
</tr>
<tr>
<td>800</td>
<td>User 800</td>
<td>F</td>
<td>0.0181411777994256</td>
<td>U</td>
</tr>
<tr>
<td>867</td>
<td>User 867</td>
<td>F</td>
<td>0.0401728770664721</td>
<td>U</td>
</tr>
<tr>
<td>882</td>
<td>User 882</td>
<td>M</td>
<td>0.884671868426923</td>
<td>U</td>
</tr>
<tr>
<td>902</td>
<td>User 902</td>
<td>F</td>
<td>0.802525467489821</td>
<td>U</td>
</tr>
<tr>
<td colspan="100">14 rows fetched in 0.0006s (0.1989s)</td>
</tr>
</table>
</div>
<div>
<table>
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td>1</td>
<td>PRIMARY</td>
<td>&lt;derived2&gt;</td>
<td>ALL</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>1000</td>
<td>100.00</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>DERIVED</td>
<td>&lt;derived3&gt;</td>
<td>system</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>1</td>
<td>100.00</td>
<td>Using temporary; Using filesort</td>
</tr>
<tr>
<td>2</td>
<td>DERIVED</td>
<td>u</td>
<td>ALL</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>871</td>
<td>100.00</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>DERIVED</td>
<td>v</td>
<td>ref</td>
<td>PRIMARY</td>
<td>PRIMARY</td>
<td>4</td>
<td>20100325_bayes.u.id</td>
<td>12</td>
<td>100.00</td>
<td>Using index</td>
</tr>
<tr>
<td>2</td>
<td>DERIVED</td>
<td>s</td>
<td>eq_ref</td>
<td>PRIMARY</td>
<td>PRIMARY</td>
<td>4</td>
<td>20100325_bayes.v.s_id</td>
<td>1</td>
<td>100.00</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>DERIVED</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>No tables used</td>
</tr>
</table>
</div>
<pre>
select `q`.`id` AS `id`,`q`.`name` AS `name`,`q`.`gender` AS `gender`,`q`.`posterior` AS `posterior`,(case when (`q`.`posterior` &lt; 0.01) then &#39;F&#39; when (`q`.`posterior` &gt; 0.99) then &#39;M&#39; else &#39;U&#39; end) AS `guessed` from (select `20100325_bayes`.`u`.`id` AS `id`,`20100325_bayes`.`u`.`name` AS `name`,`20100325_bayes`.`u`.`gender` AS `gender`,((&#39;0.5&#39; * exp(sum(ln(`20100325_bayes`.`s`.`male`)))) / ((&#39;0.5&#39; * exp(sum(ln(`20100325_bayes`.`s`.`male`)))) + ((1 - &#39;0.5&#39;) * exp(sum(ln((1 - `20100325_bayes`.`s`.`male`))))))) AS `posterior` from (select 0.5 AS `prior`) `vars` join `20100325_bayes`.`t_user` `u` left join `20100325_bayes`.`t_visit` `v` on((`20100325_bayes`.`v`.`u_id` = `20100325_bayes`.`u`.`id`)) left join `20100325_bayes`.`t_site` `s` on((`20100325_bayes`.`s`.`id` = `20100325_bayes`.`v`.`s_id`)) where 1 group by `20100325_bayes`.`u`.`id`) `q` having (convert(`guessed` using utf8) &lt;&gt; `q`.`gender`)
</pre>
<p>From <strong>1,000</strong> users, we only have <strong>14</strong> results outside the credible interval of <strong>99%</strong> and all of these are undefined rather than false.</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24039&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24039&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:23:"Alex Bolenok (Quassnoi)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:33;a:6:{s:4:"data";s:38:"
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:15:"fast IO and ndb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:69:"tag:blogger.com,1999:blog-1526532204016125586.post-542729676327822358";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:61:"http://jonasoreland.blogspot.com/2010/03/fast-io-and-ndb.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2743:"i read Marks blog entry about fast IO for PBXT, InnoDB (1.0.6 plugin) and MyISAM with some interest.then there was the durable, not durable and really not durable blog in which Mark and LinuxJedi discussed cluster, and Mark wondered how much IO one could get from 1 data-node with ndb.so I decided to try.the setup is similar.ndb cluster using disk-tables, with a tablespace stored in /dev/shm.Data occupied 7.3G, which is slightly less than (some of) the others, this is as we don't support having columns with indexes stored on disk. I.e any column that has an index will be stored only in memory instead.ndb does however not support "handler" statements, so I used oltp-point-select-all-cols instead.sysbench --test=oltp --mysql-host=foobar --mysql-user=root \--mysql-password=pw \--mysql-db=test \--oltp-table-size=40000000 \--max-time=60 --max-requests=0 \--mysql-table-engine=ndb \--db-ps-mode=disable --mysql-engine-trx=yes \--oltp-read-only --oltp-skip-trx --oltp-test-mode=simple \--oltp-point-select-all-cols \--oltp-dist-type=uniform \--oltp-range-size=1000 \--num-threads=1 --seed-rng=1 runresults are not super great...so i did a ndbapi program (which is roughly the equivalent of handler-statements)these number look a bit better. but datanode (ndbmtd) was almost idle when running this...so i made another experiment. Instead of retrieving 1 row at a time (set@r = rand() % 40000000; select * from sbtest where id = @r) i changed to retrive 16 rows at a time (set @r1 = rand(); set @r2 = rand(); select * from sbtest where id in (@r1...@r16).i believe these results are relevant given that mark's aim was to test fast IO,and i think that this rewrite wont affect other SE as much as it does with ndb.and those numbers was quite ok.as an extra bonus, i also tried with using our memory tables (alter table sbtest storage memory), also with ndbapi and 16 rows at a time.these tests was executed on a 16-core machine Intel(R) Xeon(R) CPU E7420@2.13GHzand in my config.ini I hadDiskPageBufferMemory = 1GBDiskIOThreadPool=8FileSystemPath=/dev/shmnote: I first create tables in myisam, then converted them to ndb disk-tables by issuing "alter table sbtest storage disk tablespace TS engine = ndb" and all tests was using ndb-cluster-connection-pool=4all results:SQL      2756   4998   7133   9130  10720  12222  13305  14190  14626  15287  15547  15955  16143  16334  16507  16757ndbapi   4177   7581  10319  13162  15245  17064  18874  20652  20850  24131  25976  24910  29832  30666  32625  34841b=16     1520  11374  35454  53396  55601  63248  71819  78468 103324  97330  97572 111099 125564 126790 133873 141588mm b=16 73004 128296 172207 207361 246907 270783 293753 312435 327006 345085 346924 374837 360747 372192 376887 394862";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 17:16:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:5162:"i read Marks <a href="http://www.facebook.com/note.php?note_id=378043115932&amp;_fb_noscript=1">blog entry</a> about fast IO for PBXT, InnoDB (1.0.6 plugin) and MyISAM with some interest.<br /><br />then there was the <a href="http://mysqlha.blogspot.com/2010/03/durable-not-durable-and-really-not.html">durable, not durable and really not durable blog</a> in which Mark and <a href="http://www.blogger.com/profile/03203602815866231586">LinuxJed</a><a href="http://www.blogger.com/profile/03203602815866231586">i</a> discussed cluster, and Mark wondered how much IO one could get from 1 data-node with ndb.<br /><br />so I decided to try.<br />the setup is similar.<br />ndb cluster using disk-tables, with a tablespace stored in /dev/shm.<br /><br />Data occupied 7.3G, which is slightly less than (some of) the others, this is as we don't support having columns with indexes stored on disk. I.e any column that has an index will be stored only in memory instead.<br /><br />ndb does however not support "handler" statements, so I used oltp-point-select-all-cols instead.<br /><pre><br />sysbench --test=oltp --mysql-host=foobar --mysql-user=root \<br />--mysql-password=pw \<br />--mysql-db=test \<br />--oltp-table-size=40000000 \<br />--max-time=60 --max-requests=0 \<br />--mysql-table-engine=<b>ndb</b> \<br />--db-ps-mode=disable --mysql-engine-trx=yes \<br />--oltp-read-only --oltp-skip-trx --oltp-test-mode=simple \<br />--<b>oltp-point-select-all-cols</b> \<br />--oltp-dist-type=uniform \<br />--oltp-range-size=1000 \<br />--num-threads=1 --seed-rng=1 run<br /></pre><br /><br /><a href="http://2.bp.blogspot.com/_F31z7Q6TpVA/S6ueiF9EHdI/AAAAAAAAAAc/tPB5gTAojkM/s1600/Screenshot-1.png"><img style="cursor: pointer;" src="http://2.bp.blogspot.com/_F31z7Q6TpVA/S6ueiF9EHdI/AAAAAAAAAAc/tPB5gTAojkM/s320/Screenshot-1.png" alt="" id="BLOGGER_PHOTO_ID_5452626082413157842" border="0" /></a><br /><br />results are not super great...<br />so i did a ndbapi program (which is roughly the equivalent of handler-statements)<br /><br /><a href="http://3.bp.blogspot.com/_F31z7Q6TpVA/S6ufGCmtWsI/AAAAAAAAAAk/Y-bgSupxnVQ/s1600/Screenshot-2.png"><img style="cursor: pointer; width: 320px; height: 122px;" src="http://3.bp.blogspot.com/_F31z7Q6TpVA/S6ufGCmtWsI/AAAAAAAAAAk/Y-bgSupxnVQ/s320/Screenshot-2.png" alt="" id="BLOGGER_PHOTO_ID_5452626699989375682" border="0" /></a><br /><br />these number look a bit better. but datanode (ndbmtd) was almost idle when running this...<br /><br />so i made another experiment. Instead of retrieving 1 row at a time (set@r = rand() % 40000000; select * from sbtest where id = @r) i changed to retrive 16 rows at a time (set @r1 = rand(); set @r2 = rand(); select * from sbtest where id in (@r1...@r16).<br /><br /><a href="http://3.bp.blogspot.com/_F31z7Q6TpVA/S6ugOOM1T6I/AAAAAAAAAAs/wAP7RLnXx_k/s1600/Screenshot-3.png"><img style="cursor: pointer; width: 320px; height: 125px;" src="http://3.bp.blogspot.com/_F31z7Q6TpVA/S6ugOOM1T6I/AAAAAAAAAAs/wAP7RLnXx_k/s320/Screenshot-3.png" alt="" id="BLOGGER_PHOTO_ID_5452627940052651938" border="0" /></a><br /><br />i believe these results are relevant given that mark's aim was to test fast IO,<br />and i think that this rewrite wont affect other SE as much as it does with ndb.<br />and those numbers was quite ok.<br /><br />as an extra bonus, i also tried with using our memory tables (alter table sbtest storage memory), also with ndbapi and 16 rows at a time.<br /><br /><a href="http://3.bp.blogspot.com/_F31z7Q6TpVA/S6uhjiIfbqI/AAAAAAAAAA8/EBXkoeu0T60/s1600/Screenshot-4.png"><img style="cursor: pointer; width: 320px; height: 126px;" src="http://3.bp.blogspot.com/_F31z7Q6TpVA/S6uhjiIfbqI/AAAAAAAAAA8/EBXkoeu0T60/s320/Screenshot-4.png" alt="" id="BLOGGER_PHOTO_ID_5452629405692030626" border="0" /></a><br /><br />these tests was executed on a 16-core machine Intel(R) Xeon(R) CPU E7420@2.13GHz<br />and in my config.ini I had<br /><pre><br />DiskPageBufferMemory = 1GB<br />DiskIOThreadPool=8<br />FileSystemPath=/dev/shm<br /></pre><br /><br />note: I first create tables in myisam, then converted them to ndb disk-tables by issuing "alter table sbtest storage disk tablespace TS engine = ndb" and all tests was using ndb-cluster-connection-pool=4<br /><br />all results:<br /><pre><br />SQL      2756   4998   7133   9130  10720  12222  13305  14190  14626  15287  15547  15955  16143  16334  16507  16757<br />ndbapi   4177   7581  10319  13162  15245  17064  18874  20652  20850  24131  25976  24910  29832  30666  32625  34841<br />b=16     1520  11374  35454  53396  55601  63248  71819  78468 103324  97330  97572 111099 125564 126790 133873 141588<br />mm b=16 73004 128296 172207 207361 246907 270783 293753 312435 327006 345085 346924 374837 360747 372192 376887 394862<br /></pre><div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/1526532204016125586-542729676327822358?l=jonasoreland.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24038&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24038&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Jonas Oreland";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:34;a:6:{s:4:"data";s:108:"
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:89:"Kontrollbase &#8211; graph &#8220;no data to display&#8221; on new install has been fixed";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:30:"http://kontrollsoft.com/?p=745";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:62:"http://feedproxy.google.com/~r/Kontrollsoft/~3/PGSs_Wtl87Y/745";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:337:"If you have been wondering why the overview and graphs pages say &#8220;no data to display&#8221; on the graphs when you first install Kontrollbase, it&#8217;s because there&#8217;s no data in the database being returned from the queries that generate the graphs &#8211; this is because a new install has no data to graph. This has [...]";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 16:51:52 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:14:{i:0;a:5:{s:4:"data";s:12:"announcement";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:9:"analytics";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:8:"database";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:12:"fusioncharts";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:8:"graphing";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:6:"graphs";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:12:"kontrollbase";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:5:"linux";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:8;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:9;a:5:{s:4:"data";s:12:"mysql server";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:10;a:5:{s:4:"data";s:3:"php";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:11;a:5:{s:4:"data";s:10:"relational";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:12;a:5:{s:4:"data";s:9:"reporting";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:13;a:5:{s:4:"data";s:7:"reports";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:643:"If you have been wondering why the overview and graphs pages say &#8220;no data to display&#8221; on the graphs when you first install Kontrollbase, it&#8217;s because there&#8217;s no data in the database being returned from the queries that generate the graphs &#8211; this is because a new install has no data to graph. This has [...]<img src="http://feeds.feedburner.com/~r/Kontrollsoft/~4/PGSs_Wtl87Y" height="1" width="1" /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24034&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24034&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:9:"Matt Reid";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:35;a:6:{s:4:"data";s:148:"
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:52:"Please break our open source business strategy model";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:47:"http://blogs.the451group.com/opensource/?p=1491";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:60:"http://feedproxy.google.com/~r/451opensource/~3/AFvIkmhetUU/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:4312:"Last week I presented “From support services to software services – the evolution of open source business strategies” at the OSBC event in San Francisco.
The presentation was effectively a work in progress update on our research into the various strategies employed by technology vendors to generate revenue from open source software.
It included a partial explanation of my theory that those strategies do not exist in isolation, but are steps on an evolutionary process, and also introduced our model for visualizing the core elements of an open source-related business strategy.

I provided a number of examples of how the model could be used to compare the strategies of various open source businesses. Here, for example, is the visualization of MySQL’s strategy.

I was pleased with the response to the presentation, not least the number of people who asked us to send them the slide so they could fill it in for their company and send it back to us.
This is definitely something we would like to do in the future but before we do I would like to ensure we have dealt with any problems related to the model. For now I would be more interested in hearing from companies that feel their strategy is NOT covered by the model.
As Jack Repenning has pointed out, the model does not offer the granularity to express some of the nuances of the various “open complement “ strategies where open source code is not monetized directly but via complementary products (and in my own presentation I had to go beyond the model to discuss “open inside” – building proprietary products on open foundations, and “open edge” – using open source to drive innovation on top of a closed platform).
My initial feeling is that there will always be a level of detail that cannot be expressed in a simplified model such as this, although if I can build them in I will.
The development model category also needs some tinkering, not least to cover “gated community” approaches.
Additionally, of course, the model is not great when it comes to multi-product companies (although multiple models can be used to explain a larger strategy).
So anyway, if you think your company does not fit our model, do please tell us how. To help you understand how the model works, here’s a quick user guide and glossary of terms.
Revenue triggers:
These are the things that paying customers actually pay money for (apart from advertising which is an indirect relationship). They should be pretty self-explanatory. When we refer to “support services” we mean support, training, consulting, implementation services etc. &#8220;Software services&#8221; refers to SaaS and cloud delivery. Vendors can have multiple revenue triggers for a single product.

Software license:
For the purposes of this exercise we are interested in whether the company has a preference for permissive or reciprocal licensing for the underlying open source project, or uses both.
End user licensing:
What licensing strategy is applied to the product that customers pay for (as opposed to the project that it is based on)? It could be the same open source license (single open source) or a combination of open source licenses (assembled open source). It could be that the same code is available using open source and commercial licenses (dual licensing) or that commercial extensions are available (open core). Alternatively, a vendor may not monetize the open source project itself, but offer complementary software or hardware products (open complement), or may turn the open source code into a fully proprietary product (closed).  Pick one.

Development model:
This requires a two-part response. Is the open source code developed in public, in private, or a combination of the two (public/private)?  Pick one.
Is the development effort dominated my employees of a vendor, or the result of true community collaboration, or an aggregate of multiple projects? Pick one.
Copyright:
Who owns the copyright for the open source code? Is it the vendor in question, a foundation, a distributed collection of companies/individuals, or another company (withheld)? Normally this would be a matter of picking one of the four options, although if a portion of the copyright is withheld, that could be used along with one of the other three.
Do your worst.
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 13:16:09 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:22:{i:0;a:5:{s:4:"data";s:15:"Business models";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:11:"Conferences";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:8:"Software";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:9:"451 group";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:13:"451caostheory";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:8:"451group";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:19:"business strategies";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:10:"caostheory";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:8;a:5:{s:4:"data";s:5:"Linux";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:9;a:5:{s:4:"data";s:11:"matt aslett";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:10;a:5:{s:4:"data";s:10:"mattaslett";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:11;a:5:{s:4:"data";s:14:"matthew aslett";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:12;a:5:{s:4:"data";s:13:"matthewaslett";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:13;a:5:{s:4:"data";s:15:"open complement";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:14;a:5:{s:4:"data";s:9:"open edge";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:15;a:5:{s:4:"data";s:11:"open inside";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:16;a:5:{s:4:"data";s:9:"Open-Core";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:17;a:5:{s:4:"data";s:11:"open-source";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:18;a:5:{s:4:"data";s:10:"opensource";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:19;a:5:{s:4:"data";s:4:"OSBC";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:20;a:5:{s:4:"data";s:13:"The 451 Group";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:21;a:5:{s:4:"data";s:6:"the451";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:5548:"<p>Last week I presented “<a href="http://www.osbc.com/ehome/index.php?eventid=7578&amp;tabid=3659&amp;#matt">From support services to software services – the evolution of open source business strategies</a>” at the OSBC event in San Francisco.</p>
<p>The presentation was effectively a work in progress update on our research into the various strategies employed by technology vendors to generate revenue from open source software.</p>
<p>It included a partial explanation of my theory that those strategies do not exist in isolation, but are steps on an evolutionary process, and also introduced our model for visualizing the core elements of an open source-related business strategy.</p>
<p><a href="http://picasaweb.google.com/lh/photo/bXWjXLqQOh0yZP1ZMaiHkg?feat=embedwebsite"><img src="http://lh4.ggpht.com/_P6_U1HkHY4E/S6t09rLGzeI/AAAAAAAABAk/GzgEn-OZUJI/s400/oss%20strategy.jpg" /></a></p>
<p>I provided a number of examples of how the model could be used to compare the strategies of various open source businesses. Here, for example, is the visualization of MySQL’s strategy.</p>
<p><a href="http://picasaweb.google.com/lh/photo/H9FoiL5sRjPenRTwlKazyQ?feat=embedwebsite"><img src="http://lh6.ggpht.com/_P6_U1HkHY4E/S6t09ujk4lI/AAAAAAAABAo/hSUqJXavMA8/s400/mysql.jpg" /></a></p>
<p>I was pleased with the response to the presentation, not least the number of people who asked us to send them the slide so they could fill it in for their company and send it back to us.</p>
<p>This is definitely something we would like to do in the future but before we do I would like to ensure we have dealt with any problems related to the model. For now I would be more interested in hearing from companies that feel their strategy is NOT covered by the model.</p>
<p>As Jack Repenning has <a href="http://stephesblog.blogs.com/my_weblog/2010/03/visualizing-open-source-business-models.html?cid=6a00d8341c57b753ef0120a965ff34970b#comment-6a00d8341c57b753ef0120a965ff34970b">pointed out</a>, the model does not offer the granularity to express some of the nuances of the various “open complement “ strategies where open source code is not monetized directly but via complementary products (and in my own presentation I had to go beyond the model to discuss “open inside” – building proprietary products on open foundations, and “open edge” – using open source to drive innovation on top of a closed platform).</p>
<p>My initial feeling is that there will always be a level of detail that cannot be expressed in a simplified model such as this, although if I can build them in I will.</p>
<p>The development model category also needs some tinkering, not least to cover “gated community” approaches.</p>
<p>Additionally, of course, the model is not great when it comes to multi-product companies (although multiple models can be used to explain a larger strategy).</p>
<p>So anyway, if you think your company does not fit our model, do please tell us how. To help you understand how the model works, here’s a quick user guide and glossary of terms.</p>
<p><strong>Revenue triggers:</strong><br />
These are the things that paying customers actually pay money for (apart from advertising which is an indirect relationship). They should be pretty self-explanatory. When we refer to “support services” we mean support, training, consulting, implementation services etc. &#8220;Software services&#8221; refers to SaaS and cloud delivery. Vendors can have multiple revenue triggers for a single product.<br />
<strong><br />
Software license:</strong><br />
For the purposes of this exercise we are interested in whether the company has a preference for permissive or reciprocal licensing for the underlying open source project, or uses both.</p>
<p><strong>End user licensing:</strong><br />
What licensing strategy is applied to the product that customers pay for (as opposed to the project that it is based on)? It could be the same open source license (single open source) or a combination of open source licenses (assembled open source). It could be that the same code is available using open source and commercial licenses (dual licensing) or that commercial extensions are available (open core). Alternatively, a vendor may not monetize the open source project itself, but offer complementary software or hardware products (open complement), or may turn the open source code into a fully proprietary product (closed).  Pick one.<br />
<strong><br />
Development model:</strong><br />
This requires a two-part response. Is the open source code developed in public, in private, or a combination of the two (public/private)?  Pick one.<br />
Is the development effort dominated my employees of a vendor, or the result of true community collaboration, or an aggregate of multiple projects? Pick one.</p>
<p><strong>Copyright:</strong><br />
Who owns the copyright for the open source code? Is it the vendor in question, a foundation, a distributed collection of companies/individuals, or another company (withheld)? Normally this would be a matter of picking one of the four options, although if a portion of the copyright is withheld, that could be used along with one of the other three.</p>
<p>Do your worst.</p>
<img src="http://feeds.feedburner.com/~r/451opensource/~4/AFvIkmhetUU" height="1" width="1" /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24031&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24031&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"The 451 Group";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:36;a:6:{s:4:"data";s:53:"
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:30:"Ready for the User Conference?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-1767548987184410343.post-3338427264477357153";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:67:"http://izoratti.blogspot.com/2010/03/ready-for-user-conference.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2536:"I cannot recall any significant moment of the conferences in 2002 and 2003 (simply because I was not there) but…In 2005 we had MySQL 5. Peter Zaitsev was still working in the benchmark team for MySQL AB. His presentation on InnoDB performance and tuning was enlighting for many.In 2006 we discovered the Pluggable Storage Engine API. Jim Starkey joined MySQL AB and we announced Falcon. [A brighter note,] I have been so lucky to meet Paul McCullagh the day before the Conference. Paul is one of the nicest and most brilliant persons I have ever met.2007 was all around 5.1. We announced the roadmap for 6.0 and our online cross engine backup.In 2008 we were Sun and for the first time Marten Mickos left his place on stage of the UC to Jonathan Schwartz.In 2009 we had the Oracle announcement and the Percona Conference. You may describe the conference in many way, it certainly wasn't boring!And now, 2010. Another User Conference with tremendous content. It's an incredible occasion to learn from the key players at MySQL and in the MySQL ecosystem.&nbsp;I will present at the User Conference. For once, I am back to my roots, i.e. Data Warehousing and Business Intelligence. DW projects filled my working life from 1994 until 2005.Do not expect any elegant fragment of C code that may improve the performance of your DB 100x (or it may crash all your servers). You'll see just real life ideas and solutions on how to use MySQL in Data Warehousing and in a typical (is there one really?) Business Intelligence environment. And perhaps you have already implemented something similar, or something that suits you better, but hey, sharing is the main point here.This presentation at the UC is just the beginning of a series. More to come, since this is a hot topic and users are asking more and more from MySQL in this sector. They want to use MySQL in BI in many, many ways. Some are simply looking for a reporting platform - you replicate your data and there you, you execute some reports. Others have more specific needs and they must transform their data into information, in a typical BI style. Some others are Enterprises with large data warehouses and they see MySQL as the perfect data mart engine. These topics and more in my presentation at the UC.If you are part of the local London MySQL group and you can't travel to CA, don't worry, we will a series of meetups and they are likely to be recorded and presented in other cities as well.In any case, I hope to see you at the UC. Check the message board for some Euro spots!";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 13:00:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:21:"MySQL User Conference";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:8:"MySQL UC";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:3247:"<div>I cannot recall any significant moment of the conferences in 2002 and 2003 (simply because I was not there) but…</div><div><br /></div><div>In 2005 we had MySQL 5. Peter Zaitsev was still working in the benchmark team for MySQL AB. His presentation on InnoDB performance and tuning was enlighting for many.</div><div><br /></div><div>In 2006 we discovered the Pluggable Storage Engine API. Jim Starkey joined MySQL AB and we announced Falcon. [A brighter note,] I have been so lucky to meet Paul McCullagh the day before the Conference. Paul is one of the nicest and most brilliant persons I have ever met.</div><div><br /></div><div>2007 was all around 5.1. We announced the roadmap for 6.0 and our online cross engine backup.</div><div><br /></div><div>In 2008 we were Sun and for the first time Marten Mickos left his place on stage of the UC to Jonathan Schwartz.</div><div><br /></div><div>In 2009 we had the Oracle announcement and the Percona Conference. You may describe the conference in many way, it certainly wasn't boring!</div><div><br /></div><div>And now, 2010. Another User Conference with tremendous content. It's an incredible occasion to learn from the key players at MySQL and in the MySQL ecosystem.&nbsp;</div><div><br /></div><div>I will present at the User Conference. For once, I am back to my roots, i.e. Data Warehousing and Business Intelligence. DW projects filled my working life from 1994 until 2005.</div><div><br /></div><div>Do not expect any elegant fragment of C code that may improve the performance of your DB 100x (or it may crash all your servers). You'll see just real life ideas and solutions on how to use MySQL in Data Warehousing and in a typical (is there one really?) Business Intelligence environment. And perhaps you have already implemented something similar, or something that suits you better, but hey, sharing is the main point here.</div><div><br /></div><div>This presentation at the UC is just the beginning of a series. More to come, since this is a hot topic and users are asking more and more from MySQL in this sector. They want to use MySQL in BI in many, many ways. Some are simply looking for a reporting platform - you replicate your data and there you, you execute some reports. Others have more specific needs and they must transform their data into information, in a typical BI style. Some others are Enterprises with large data warehouses and they see MySQL as the perfect data mart engine. These topics and more in my presentation at the UC.</div><div><br /></div><div>If you are part of the local London MySQL group and you can't travel to CA, don't worry, we will a series of meetups and they are likely to be recorded and presented in other cities as well.</div><div><br /></div><div>In any case, I hope to see you at the UC. Check the message board for some Euro spots!</div><div><br /></div><div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/1767548987184410343-3338427264477357153?l=izoratti.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24032&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24032&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:12:"Ivan Zoratti";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:37;a:6:{s:4:"data";s:73:"
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:44:"ACID tradeoffs, modularity, plugins, Drizzle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-2987855187574329171.post-6330347659477238978";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:81:"http://messagepassing.blogspot.com/2010/03/acid-tradeoffs-modularity-plugins.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:4011:"Most software people are aware of the ACID acronym coined by Jim Gray.  With the growth of the web and open source, the scaling and complexity constraints imposed on DBMS implementations supporting ACID are more visible, and new (or at least new terms for known) compromises and tradeoffs are being discussed widely.  The better known NoSQL systems are giving insight by example into particular choices of tradeoffs.Working at MySQL, I have often been surprised at the variety of potential alternatives when implementing a DBMS, and the number of applications which don't need the full set of ACID letters in the strictest form. The original MySQL storage engine, MyISAM is one of the first and most successful examples of an 'ACID remix'.  The people drawn to DBMS development work often have a perfectionist streak, which can cause them to tend to prefer 'nothing' over 'imperfect'.  MyISAM was and still is a flag-bearer for 'good enough'.  Perhaps we should be less modest and call it 'more than good enough'.One seldom discussed benefit of MySQL's storage engine architecture is that pressure to make 'The One True Storage Engine' is reduced.  DBMS products with one fixed database engine need to optimise for all supported use cases.  This is a great engineering challenge, but increases design effort, requirements for configuration and auto-tuning, constraints on any design change or reoptimisation etc.  With MySQL, there are multiple existing storage engines, each with a (sub)set of target use-cases in mind.  A single MySQL server can maintain and access tables in different storage engines, each tuned as closely as possible to the use-case for the data, without adding complexity to unrelated engines.  Engines can be wildly optimised for a narrow use case as there are plausible alternative engines available for other use cases.I understand that one aim of the Drizzle project is to extend the modularity of the MySQL Server on multiple axes, allowing diversity to flourish.  As a one-time Java coder, who enjoyed the pleasures of design-by-interface, I can see the attraction.  While the effort is guided by an actual need for modularity and real examples of alternative plugins, it can be a great force multiplier.  There is always the risk of modularity for its own sake - a branch of Architecture Astronautics.  Sure symptoms, which I may have suffered from in the past, include the class names FactoryFactory..., PolicyPolicy, or [Anything]Broker).Another good vibe from Drizzle is the microkernel concept, although would say that there's some terminological abuse occurring here!  Perhaps it could more reasonably be said that MySQL has a TeraKernel and Drizzle has a MegaKernel?  In any case the motivations are good.  Decoupling the huge chunks of functionality glued together inside MySQLD is great for long term software integrity, understanding dependencies, finding (and introducing) bugs, and might make it easier to start adding functionality again.  Replication seems especially ripe ground for alternative plugins.  User authentication is another often requested 'chunk'.  It will take longer to crystalise interfaces for more deeply embedded areas like the query Optimizer/Executor, but if these interfaces are arising from a real need then that can drive the API design.One aspect of storage engine modularity that is not often mentioned is that some MySQL storage engines also moonlight with other products.  The Berkeley database (BDB) is probably the oldest and most promiscuous, embedded in DNS daemons, LDAP servers and all sorts of other places.  Ndb is unusual in that it can be used from separate MySQLD and other NdbApi processes at the same time.  InnoDB has also recently added an embedded variant. This trend will accelerate, especially when some of the distributed NoSQL systems start supporting 'pluggable local storage' APIs.   I imagine that a NoSQL local storage engine API could be somewhat simpler to implement than the MySQL SE API, at least to start with!";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 09:48:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:7:{i:0;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:7:"cluster";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:8:"rambling";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:5:"nosql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:6:"design";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:7:"general";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:19:"distributed-systems";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:5023:"Most software people are aware of the <a href="http://en.wikipedia.org/wiki/ACID">ACID</a> acronym coined by Jim Gray.  With the growth of the web and open source, the scaling and complexity constraints imposed on DBMS implementations supporting ACID are more visible, and new (or at least new terms for known) compromises and tradeoffs are being discussed widely.  The better known <a href="http://en.wikipedia.org/wiki/NoSQL">NoSQL</a> systems are giving insight by example into particular choices of tradeoffs.<br /><br />Working at MySQL, I have often been surprised at the variety of potential alternatives when implementing a DBMS, and the number of applications which don't need the full set of ACID letters in the strictest form. The original MySQL storage engine, <a href="http://en.wikipedia.org/wiki/MyISAM">MyISAM</a> is one of the first and most successful examples of an 'ACID remix'.  The people drawn to DBMS development work often have a perfectionist streak, which can cause them to tend to prefer 'nothing' over 'imperfect'.  MyISAM was and still is a flag-bearer for '<a href="http://en.wikipedia.org/wiki/Principle_of_good_enough">good enough</a>'.  Perhaps we should be less modest and call it 'more than good enough'.<br /><br />One seldom discussed benefit of MySQL's storage engine architecture is that pressure to make 'The One True Storage Engine' is reduced.  DBMS products with one fixed database engine need to optimise for all supported use cases.  This is a great engineering challenge, but increases design effort, requirements for configuration and auto-tuning, constraints on any design change or reoptimisation etc.  With MySQL, there are multiple existing storage engines, each with a (sub)set of target use-cases in mind.  A single MySQL server can maintain and access tables in different storage engines, each tuned as closely as possible to the use-case for the data, without adding complexity to unrelated engines.  Engines can be wildly optimised for a narrow use case as there are plausible alternative engines available for other use cases.<br /><br />I understand that one aim of the <a href="http://drizzle.org/">Drizzle</a> project is to extend the modularity of the MySQL Server on multiple axes, allowing diversity to flourish.  As a one-time Java coder, who enjoyed the pleasures of <a href="http://www.drdobbs.com/184410856;jsessionid=SVWNPGOEDVC3LQE1GHOSKH4ATMY32JVN?pgno=1">design-by-interface</a>, I can see the attraction.  While the effort is guided by an actual need for modularity and real examples of alternative plugins, it can be a great force multiplier.  There is always the risk of modularity for its own sake - a branch of Architecture Astronautics.  Sure symptoms, which I may have suffered from in the past, include the class names <a href="http://discuss.joelonsoftware.com/default.asp?joel.3.219431.12">FactoryFactory</a>..., PolicyPolicy, or <anything>[Anything]Broker).<br /><br />Another good vibe from Drizzle is the <a href="http://en.wikipedia.org/wiki/Microkernel">microkernel</a> concept, although would say that there's some terminological abuse occurring here!  Perhaps it could more reasonably be said that MySQL has a TeraKernel and Drizzle has a MegaKernel?  In any case the motivations are good.  Decoupling the huge chunks of functionality glued together inside MySQLD is great for long term software integrity, understanding dependencies, finding (and introducing) bugs, and might make it easier to start adding functionality again.  Replication seems especially ripe ground for alternative plugins.  User authentication is another often requested 'chunk'.  It will take longer to crystalise interfaces for more deeply embedded areas like the query Optimizer/Executor, but if these interfaces are arising from a real need then that can drive the API design.<br /><br />One aspect of storage engine modularity that is not often mentioned is that some MySQL storage engines also moonlight with other products.  The Berkeley database (BDB) is probably the oldest and most promiscuous, embedded in DNS daemons, LDAP servers and all sorts of other places.  Ndb is unusual in that it can be used from separate MySQLD and other NdbApi processes at the same time.  InnoDB has also recently added an <a href="http://www.innodb.com/wp/products/embedded-innodb/">embedded</a> variant. This trend will accelerate, especially when some of the distributed NoSQL systems start supporting 'pluggable local storage' APIs.   I imagine that a NoSQL local storage engine API could be somewhat simpler to implement than the MySQL SE API, at least to start with!<br /></anything><div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/2987855187574329171-6330347659477238978?l=messagepassing.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24041&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24041&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:14:"Frazer Clement";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:38;a:6:{s:4:"data";s:48:"
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:48:"UC2010 - MySQL Cluster Deploy and Perf Tuning BP";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:59:"tag:blogger.com,1999:blog-19281624.post-3581554879687007737";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:84:"http://johanandersson.blogspot.com/2010/03/uc2010-mysql-cluster-deploy-and-perf.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:783:"At the MySQL UC 2010 (on Tuesday 4/14  11:55 am) me and my colleague Joffrey Michaie will present MySQL Cluster - Deployment Best Practices. We will give talk about what is important to think about when deploying MySQL Cluster, what to do, what not to do, operational aspects and a few other practical things.This session will be a great follow-on to the introductory session on MySQL Cluster (and the tutorial).After the Deployment session, same day at 2:00pm, I will also have a session on MySQL Cluster Performance Tuning Best Practices. In this session you will learn tricks and tips how to tune your Cluster, e.g how to tune and design your schema and queries so it runs optimally on MySQL Cluster.So if you want to see what happens after this slide you should come and join us!";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 09:01:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:2:{i:0;a:5:{s:4:"data";s:13:"MySQL Cluster";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:6:"uc2010";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2279:"<a href="http://en.oreilly.com/mysql2010/public/schedule/speaker/13954"><img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 125px; height: 125px;" src="http://2.bp.blogspot.com/_ZP94gz0B_KE/S6stmSPX0pI/AAAAAAAAAJ4/Qs8TxB3Yj4k/s320/mysql2010_speaking_badge_125x125.gif" alt="" id="BLOGGER_PHOTO_ID_5452501909616710290" border="0" /></a>At the MySQL UC 2010 (on Tuesday 4/14  11:55 am) me and my colleague Joffrey Michaie will present <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12446">MySQL Cluster - Deployment Best Practices</a>. We will give talk about what is important to think about when deploying MySQL Cluster, what to do, what not to do, operational aspects and a few other practical things.<br />This session will be a great follow-on to the <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12469">introductory session on MySQL Cluster</a> (and the <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12438">tutorial</a>).<br /><br />After the Deployment session, same day at 2:00pm, I will also have a session on <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12445">MySQL Cluster Performance Tuning Best Practices</a>. In this session you will learn tricks and tips how to tune your Cluster, e.g how to tune and design your schema and queries so it runs optimally on MySQL Cluster.<br /><br />So if you want to see what happens after this slide you should come and join us!<br /><br /><a href="http://2.bp.blogspot.com/_ZP94gz0B_KE/S6ss23JJelI/AAAAAAAAAJw/wQkWYgb6Uk4/s1600/+MySQL_UC2010_Cluster_Deploy.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 244px;" src="http://2.bp.blogspot.com/_ZP94gz0B_KE/S6ss23JJelI/AAAAAAAAAJw/wQkWYgb6Uk4/s320/+MySQL_UC2010_Cluster_Deploy.png" alt="" id="BLOGGER_PHOTO_ID_5452501094888995410" border="0" /></a><div><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/19281624-3581554879687007737?l=johanandersson.blogspot.com" alt="" /></div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24028&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24028&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Johan Andersson";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:39;a:6:{s:4:"data";s:38:"
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:5:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:71:"Q4M 0.9.3 prerelease (with support for quot;concurrent compactionquot;)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:73:"http://developer.cybozu.co.jp/kazuho/2010/03/q4m-093-prerele.html?lang=en";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:73:"http://developer.cybozu.co.jp/kazuho/2010/03/q4m-093-prerele.html?lang=en";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1796:"Q4M (Queue for MySQL) periodically performs an operation called &quot;compaction&quot;, which is sort of a garbage collection, that collects empty space from a queue file and returns to the OS.

The pitfall that exists until now was that during compaction, all operation on the queue table was being blocked.

My opinion was (is) that it is not a serious problem for most users, since the time required for compaction will be small in most cases (the time depends on the number (and size) of the rows alive on the queue table, and the number of the rows alive will be mostly small).

But for usecases where fast response is a requirement, I have added a &quot;queue_use_concurrent_compaction&quot; option to Q4M in the 0.9.3 prerelease.&nbsp; When the variable is set to one in my.cnf, INSERTs will not be blocked during compaction.&nbsp; Another configuration variable queue_concurrent_compaction_interval is also available to fine-tune response time of INSERTs during compaction.&nbsp; The response of INSERTs during compaction will become faster as you set the variable smaller, although the compaction will become slower as a side effect.


my.cnf
# enable / disable concurrent compaction (0: disabled (default), 1: enabled)queue_use_concurrent_compaction=1# handle INSERTs for every N bytes of data is compacted (default: 1048576)queue_concurrent_compaction_interval=1048576


If you are already using Q4M without any problems, I recommend sticking to the version you are using, since the stability of Q4M might have degraded due to the introduction of the feature.

On the other hand if you were having problems with the issue or planning to use Q4M for a new application, I recommend using this release.&nbsp; Have fun!&nbsp; And if you find any problems, please send stacktraces to me :-)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 07:02:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2173:"<p><a href="http://q4m.31tools.com/">Q4M</a> (Queue for MySQL) periodically performs an operation called &quot;compaction&quot;, which is sort of a garbage collection, that collects empty space from a queue file and returns to the OS.</p>

<p>The pitfall that exists until now was that during compaction, all operation on the queue table was being blocked.</p>

<p>My opinion was (is) that it is not a serious problem for most users, since the time required for compaction will be small in most cases (the time depends on the number (and size) of the rows alive on the queue table, and the number of the rows alive will be mostly small).</p>

<p>But for usecases where fast response is a requirement, I have added a &quot;queue_use_concurrent_compaction&quot; option to Q4M in the 0.9.3 prerelease.&nbsp; When the variable is set to <em>one</em> in <em>my.cnf</em>, INSERTs will not be blocked during compaction.&nbsp; Another configuration variable <em>queue_concurrent_compaction_interval</em> is also available to fine-tune response time of INSERTs during compaction.&nbsp; The response of INSERTs during compaction will become faster as you set the variable smaller, although the compaction will become slower as a side effect.</p>

<div>
<div>my.cnf</div>
<pre># enable / disable concurrent compaction (0: disabled (default), 1: enabled)<br />queue_use_concurrent_compaction=1<br /><br /># handle INSERTs for every N bytes of data is compacted (default: 1048576)<br />queue_concurrent_compaction_interval=1048576</pre>
</div>

<p>If you are already using Q4M without any problems, I recommend sticking to the version you are using, since the stability of Q4M might have degraded due to the introduction of the feature.</p>

<p>On the other hand if you were having problems with the issue or planning to use Q4M for a new application, I recommend using this release.&nbsp; Have fun!&nbsp; And if you find any problems, please send stacktraces to me :-)</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24030&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24030&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:10:"Kazuho Oku";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:40;a:6:{s:4:"data";s:58:"
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:61:"mycheckpoint (Rev. 118): alerts, email notifications and more";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:36:"http://code.openark.org/blog/?p=2221";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:91:"http://code.openark.org/blog/mysql/mycheckpoint-rev-118-alerts-email-notifications-and-more";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:4807:"Revision 118 of mycheckpoint has been released. New and updated in this revision:

Conditional alerts
Email notifications
Revised HTML reports, including 24/7 reports.
Updated documentation

With this new revision mycheckpoint turns into a monitoring solution for MySQL. One can now:

Store measure metrics
Query for raw, aggregated or digested metrics
Generate charts for selected metrics
View HTML reports for selecetd metrics
Define alerts conditions, query for pending alerts
Be notified via email on raised or resolved alerts.

Conditional alerts
mycheckpoint is SQL oriented. As such, it allows for creation of alert conditions, which are nothing more than SQL conditions.
For example, we wish to raise an alerts when the slave stops replicating (just ping us with an email one this happens):
INSERT INTO alert_condition (condition_eval, description, alert_delay_minutes)
  VALUES ('seconds_behind_master IS NULL', 'Slave not replicating', 0);

Or is too far behind (but since we do maintenance work during the night, it&#8217;s OK on those hours). We only want to be notified if this goes on for 10 minutes:
INSERT INTO alert_condition (condition_eval, description, alert_delay_minutes)
  VALUES ('(seconds_behind_master &gt; 60) AND (HOUR(ts) NOT BETWEEN 2 AND 4)', 'Slave lags too far behind', 10);

We want to be notified when the datadir mount point disk quota exceeds 95% usage. Oh, and please keep nagging us about this, as long as it is unresolved:
INSERT INTO alert_condition (condition_eval, description, repetitive_alert)
  VALUES ('os_datadir_mountpoint_usage_percent &gt; 95', 'datadir mount point is over 95%', 1);

There&#8217;s much more to alert conditions. You can generate a pending alerts report, get a textual presentation of raised and pending alerts, view the query which determines what alerts are currently raised, and more.
Read more on the alerts documentation page.
Email notifications
Introducing email notifications, mycheckpoint now:

Sends email notification on alert conditions meeting. See sample email screenshot.
Sends email notification when it is unable to access the database.
Sends report via mail. Currently only HTML brief report is supported. Report is attached as HTML file in email message.

Alert notifications are automatically sent by mail (once SMTP configuration is in place, see following) when an alert is raised (alert condition becomes true) or resolved (alert condition turns false).
Email notifications require simple configuration for SMTP host, SMTP-from-address, SMTP-to-address. These can be made in the defaults file (revised), or through the command line. The following example shows how one can manually send an HTML brief report:

mycheckpoint --defaults-file=/etc/mycheckpoint.cnf --smtp-from=monitor@my-server-company.com --smtp-to=dba@my-server-company.com --smtp-host=mail.my-server-company.com email_brief_report

One should generally set up these parameters in the configuration file (aka defaults file) and forget all about it. mycheckpoint now has a default for the defaults file, which is /etc/mycheckpoint.cnf.
Read more on the emails documentation page.
Revised HTML reports

The brief HTML reports has been updated, see sample.
An HTML 24/7 report as been added, see sample. This report shows the distribution of popular metrics throughout the weekdays and hours.

Full HTML reports remain slow to load. I&#8217;m putting some work into this, but I&#8217;m not sure I can work around the optimizer&#8217;s limitations of using indexes for GROUPing through views.
Updated documentation
The documentation has been revised, with more details put into the pages. Since mycheckpoint gains more and more features, I saw fit to write a Quick HOWTO page which gets you up to speed, no fuss around, with mycheckpoint&#8217;s usage and features.
Read the mycheckpoint Quick HOWTO here.
Future plans
Work is going on. These are the non-scheduled future tasks I see:

Custom monitoring + notifications. See my earlier post.
Monitoring InnoDB Plugin &amp; XtraDB status.
PROCESSLIST dump on alerts.
Interactive charts. See my earlier post.
A proper man page&#8230;

Try it out
Try out mycheckpoint. It&#8217;s a different kind of monitoring solution. It does not require to to have a web server or complicated dependencies. To the experienced DBA it can further provide with valuable, raw or digested information in the form of SQL accessible data. I have used it to find anomalies in passing months, doing SQL search for periods of time where several conditions applied &#8212; it really gives you some extra power.

Download mycheckpoint here
Visit the project&#8217;s homepage
Browse the documentation
Report bugs

mycheckpoint is released under the New BSD License.
http://code.openark.org/forge/mycheckpoint/documentation/quick-howto";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 06:26:34 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:10:"Monitoring";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:12:"mycheckpoint";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:6:"python";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:7253:"<p>Revision <strong>118</strong> of <a href="http://code.openark.org/forge/mycheckpoint">mycheckpoint</a> has been released. New and updated in this revision:</p>
<ul>
<li>Conditional alerts</li>
<li>Email notifications</li>
<li>Revised HTML reports, including 24/7 reports.</li>
<li>Updated documentation</li>
</ul>
<p>With this new revision mycheckpoint turns into a <em>monitoring solution</em> for MySQL. One can now:</p>
<ul>
<li>Store measure metrics</li>
<li>Query for raw, aggregated or digested metrics</li>
<li>Generate charts for selected metrics</li>
<li>View HTML reports for selecetd metrics</li>
<li>Define alerts conditions, query for pending alerts</li>
<li>Be notified via <em>email</em> on <em>raised</em> or <em>resolved</em> alerts.</li>
</ul>
<h4>Conditional alerts</h4>
<p><em>mycheckpoint</em> is <em>SQL oriented</em>. As such, it allows for creation of alert conditions, which are nothing more than SQL conditions.</p>
<p><span></span>For example, we wish to raise an alerts when the slave stops replicating (just ping us with an email one this happens):</p>
<blockquote><pre>INSERT INTO alert_condition (condition_eval, description, alert_delay_minutes)
  VALUES ('seconds_behind_master IS NULL', 'Slave not replicating', 0);</pre>
</blockquote>
<p>Or is too far behind (but since we do maintenance work during the night, it&#8217;s OK on those hours). We only want to be notified if this goes on for <strong>10</strong> minutes:</p>
<blockquote><pre>INSERT INTO alert_condition (condition_eval, description, alert_delay_minutes)
  VALUES ('(seconds_behind_master &gt; 60) AND (HOUR(ts) NOT BETWEEN 2 AND 4)', 'Slave lags too far behind', 10);</pre>
</blockquote>
<p>We want to be notified when the <strong>datadir</strong> mount point disk quota exceeds 95% usage. Oh, and please keep nagging us about this, as long as it is unresolved:</p>
<blockquote><pre>INSERT INTO alert_condition (condition_eval, description, repetitive_alert)
  VALUES ('os_datadir_mountpoint_usage_percent &gt; 95', 'datadir mount point is over 95%', 1);</pre>
</blockquote>
<p>There&#8217;s much more to alert conditions. You can generate a pending alerts report, get a textual presentation of raised and pending alerts, view the query which determines what alerts are currently raised, and more.</p>
<p>Read more on the <a href="http://code.openark.org/forge/mycheckpoint/documentation/alerts">alerts documentation page</a>.</p>
<h4>Email notifications</h4>
<p>Introducing email notifications, <em>mycheckpoint</em> now:</p>
<ul>
<li>Sends email notification on alert conditions meeting. See <a href="http://code.openark.org/forge/wp-content/uploads/2010/03/mycheckpoint-alerts-email-sample-113.jpeg">sample email screenshot</a>.</li>
<li>Sends email notification when it is unable to access the database.</li>
<li>Sends report via mail. Currently only HTML brief report is supported. Report is attached as HTML file in email message.</li>
</ul>
<p>Alert notifications are automatically sent by mail (once SMTP configuration is in place, see following) when an alert is <em>raised</em> (alert condition becomes <strong>true</strong>) or <em>resolved</em> (alert condition turns <strong>false</strong>).</p>
<p>Email notifications require simple configuration for SMTP host, SMTP-from-address, SMTP-to-address. These can be made in the <a href="http://code.openark.org/forge/mycheckpoint/documentation/usage#defaults_file">defaults file</a> (revised), or through the command line. The following example shows how one can manually send an HTML brief report:</p>
<blockquote>
<pre>mycheckpoint --defaults-file=/etc/mycheckpoint.cnf <strong>--smtp-from</strong>=monitor@my-server-company.com <strong>--smtp-to</strong>=dba@my-server-company.com <strong>--smtp-host</strong>=mail.my-server-company.com <strong>email_brief_report</strong></pre>
</blockquote>
<p>One should generally set up these parameters in the configuration file (aka <em>defaults file</em>) and forget all about it. mycheckpoint now has a default for the defaults file, which is <strong>/etc/mycheckpoint.cnf</strong>.</p>
<p>Read more on the <a href="http://code.openark.org/forge/mycheckpoint/documentation/emails">emails documentation page</a>.</p>
<h4>Revised HTML reports</h4>
<ul>
<li>The brief HTML reports has been updated, see <a href="http://code.openark.org/forge/wp-content/uploads/2010/03/mycheckpoint-brief-report-sample-113.html">sample</a>.</li>
<li>An HTML 24/7 report as been added, see <a href="http://code.openark.org/forge/wp-content/uploads/2010/03/mycheckpoint-24-7-report-sample-107.html">sample</a>. This report shows the distribution of popular metrics throughout the weekdays and hours.</li>
</ul>
<p>Full HTML reports remain slow to load. I&#8217;m putting some work into this, but I&#8217;m not sure I can work around the optimizer&#8217;s limitations of using indexes for GROUPing through views.</p>
<h4>Updated documentation</h4>
<p>The documentation has been revised, with more details put into the pages. Since <em>mycheckpoint</em> gains more and more features, I saw fit to write a <a href="http://code.openark.org/forge/mycheckpoint/documentation/quick-howto">Quick HOWTO</a> page which gets you up to speed, no fuss around, with <em>mycheckpoint</em>&#8217;s usage and features.</p>
<p>Read the mycheckpoint <a href="http://code.openark.org/forge/mycheckpoint/documentation/quick-howto">Quick HOWTO</a> here.</p>
<h4>Future plans</h4>
<p>Work is going on. These are the non-scheduled future tasks I see:</p>
<ul>
<li>Custom monitoring + notifications. See my <a href="http://code.openark.org/blog/mysql/things-to-monitor-on-mysql-the-users-perspective">earlier post</a>.</li>
<li>Monitoring InnoDB Plugin &amp; XtraDB status.</li>
<li>PROCESSLIST dump on alerts.</li>
<li>Interactive charts. See my <a href="http://code.openark.org/blog/mysql/static-charts-vs-interactive-charts">earlier post</a>.</li>
<li>A proper <em>man</em> page&#8230;</li>
</ul>
<h4>Try it out</h4>
<p>Try out <em>mycheckpoint</em>. It&#8217;s a different kind of monitoring solution. It does not require to to have a web server or complicated dependencies. To the experienced DBA it can further provide with valuable, raw or digested information in the form of SQL accessible data. I have used it to find anomalies in passing months, doing SQL search for periods of time where several conditions applied &#8212; it really gives you some extra power.</p>
<ul>
<li>Download mycheckpoint <a href="https://code.google.com/p/mycheckpoint/">here</a></li>
<li>Visit the project&#8217;s <a href="http://code.openark.org/forge/mycheckpoint">homepage</a></li>
<li>Browse the <a href="http://code.openark.org/forge/mycheckpoint/documentation">documentation</a></li>
<li>Report <a href="https://code.google.com/p/mycheckpoint/issues/list">bugs</a></li>
</ul>
<p><em>mycheckpoint</em> is released under the <a href="http://www.opensource.org/licenses/bsd-license.php">New BSD License</a>.</p>
<div>http://code.openark.org/forge/mycheckpoint/documentation/quick-howto</div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24026&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24026&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:12:"Shlomi Noach";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:41;a:6:{s:4:"data";s:48:"
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:27:"RAID throughput on FusionIO";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:43:"http://www.mysqlperformanceblog.com/?p=2376";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:75:"http://www.mysqlperformanceblog.com/2010/03/24/raid-throughput-on-fusionio/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:3769:"Along with maximal possible fsync/sec it is interesting how different software RAID modes affects throughput on FusionIO cards.
In short conclusion, RAID10 modes really disappoint me, the detailed numbers to follow.
To get numbers I run sysbench fileio test with 16KB page size, random read and writes, 1 and 16 threads, O_DIRECT mode.
FusionIO cards are the same as in the previous experiment, as I am running XFS with nobarrier mount options.
OS is CentOS 5.3 with  2.6.18-128.1.10.el5 kernel.
For RAID modes I use:

single card ( for baseline)
RAID0 over 2 FusionIO cards
RAID1 over 2 FusionIO cards
RAID1 over 2 RAID0 partitions (4 cards in total)
RAID0 over 2 RAID1 partitions (4 cards in total)
special RAID10 mode with n2 layout

Latest mode you can get creating RAID as:

mdadm --create --verbose /dev/md0 --level=10 --layout=n2 --raid-devices=4 --chunk=64 /dev/fioa /dev/fiob /dev/fioc /dev/fiod

In this case for all modes use  64KB chunk size ( different chunk sizes also interesting question).
There is graph for 16 threads runs, and raw results are below.

As expected RAID1 over 2 disks shows hit on write throughput comparing to single disk,
but RAID10 modes over 4 disks surprises me, showing almost 2x drops.
Only in RAID10n2  random reads skyrocket, while writes  are equal to single disk.
This makes me asking if RAID1 mode is really usable, and how it performs
on regular hard drives or SSD disks.
The performance drop in RAID settings is unexpected.  I am working with Fusion-io engineers to figure out the issue.
The next experiment I am going to look into is different page sizes.
Raw results (in requests / seconds, more is better):



single disk

read/1
12765.49

read/16
31604.86

write/1
14357.65

write/16
32447.07





raid0 2 disks

read/1
12046.12

read/16
57410.58

write/1
12993.91

write/16
43023.12





raid1 2 disks

read/1
11484.17

read/16
51084.02

write/1
9821.12

write/16
15220.57





raid1 over raid0 4 disks

read/1
10227.13

read/16
61392.25

write/1
7395.75

write/16
13536.86


raid0 over raid1 4 disks

read/1
10810.08

read/16
66316.29

write/1
8830.49

write/16
18687.97


raid10 n2

read/1
11612.89

read/16
99170.51

write/1
10634.62

write/16
31038.5

Script for reference:
PLAIN TEXT
CODE:




#!/bin/sh


set -u


set -x


set -e


&nbsp;


for size in 50G; do


&nbsp; &nbsp;for mode in rndrd rndwr; do


&nbsp; &nbsp;#for mode in rndwr; do


&nbsp; &nbsp;#for blksize in 512 4096 8192 16384 32768 65536&nbsp; ; do


&nbsp; &nbsp;for blksize in 16384 ; do


&nbsp; &nbsp; &nbsp; sysbench --test=fileio --file-num=64 --file-total-size=$size prepare


&nbsp; &nbsp; &nbsp; #for threads in 1 4 8; do


&nbsp; &nbsp; &nbsp; for threads in 1 16 ; do


&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;echo "====== testing $blksize in $threads threads"


&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;echo PARAMS $size $mode $threads $blksize&gt; sysbench-size-$size-mode-$mode-threads-$threads-blksz-$blksize


&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;for i in 1 2 3 ; do


&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;sysbench --test=fileio --file-total-size=$size --file-test-mode=$mode\


&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; --max-time=180 --max-requests=100000000 --num-threads=$threads --init-rng=on \


&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; --file-num=64 --file-extra-flags=direct --file-fsync-freq=0 --file-block-size=$blksize run \


&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | tee -a sysbench-size-$size-mode-$mode-threads-$threads-blksz-$blksize 2&gt;&amp;1


&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;done


&nbsp; &nbsp; &nbsp; done


&nbsp; &nbsp; &nbsp; sysbench --test=fileio --file-total-size=$size cleanup


&nbsp; &nbsp;done


&nbsp; &nbsp;done


done 






    
    Entry posted by Vadim |
      4 comments
    Add to:  |  |  |  | ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Thu, 25 Mar 2010 03:35:16 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:2:{i:0;a:5:{s:4:"data";s:10:"benchmarks";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:8064:"<p>Along with <a href="http://www.mysqlperformanceblog.com/2010/03/23/fsyncs-on-software-raid-on-fusionio/">maximal possible fsync/sec</a> it is interesting how different software RAID modes affects throughput on FusionIO cards.</p>
<p>In short conclusion, RAID10 modes really disappoint me, the detailed numbers to follow.</p>
<p>To get numbers I run <code>sysbench fileio</code> test with 16KB page size, random read and writes, 1 and 16 threads, O_DIRECT mode.</p>
<p>FusionIO cards are the same as in the previous experiment, as I am running XFS with nobarrier mount options.</p>
<p>OS is CentOS 5.3 with  2.6.18-128.1.10.el5 kernel.</p>
<p>For RAID modes I use:</p>
<ul>
<li>single card ( for baseline)</li>
<li>RAID0 over 2 FusionIO cards</li>
<li>RAID1 over 2 FusionIO cards</li>
<li>RAID1 over 2 RAID0 partitions (4 cards in total)</li>
<li>RAID0 over 2 RAID1 partitions (4 cards in total)</li>
<li>special RAID10 mode with n2 layout</li>
</ul>
<p>Latest mode you can get creating RAID as:<br />
<code><br />
mdadm --create --verbose /dev/md0 --level=10 --layout=n2 --raid-devices=4 --chunk=64 /dev/fioa /dev/fiob /dev/fioc /dev/fiod<br />
</code></p>
<p>In this case for all modes use  64KB chunk size ( different chunk sizes also interesting question).</p>
<p>There is graph for 16 threads runs, and raw results are below.<br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2010/03/io_throughput_16kb.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2010/03/io_throughput_16kb.png" alt="" title="io_throughput_(16kb)" width="793" height="468" class="aligncenter size-full wp-image-2386" /></a></p>
<p>As expected RAID1 over 2 disks shows hit on write throughput comparing to single disk,<br />
but RAID10 modes over 4 disks surprises me, showing almost 2x drops.</p>
<p>Only in RAID10n2  random reads skyrocket, while writes  are equal to single disk.</p>
<p>This makes me asking if RAID1 mode is really usable, and how it performs<br />
on regular hard drives or SSD disks.</p>
<p>The performance drop in RAID settings is unexpected.  I am working with Fusion-io engineers to figure out the issue.</p>
<p>The next experiment I am going to look into is different page sizes.</p>
<p>Raw results (in requests / seconds, more is better):</p>
<table border=1 cellpadding=0 cellspacing=0>
<tr>
<td>
<td><strong>single disk</strong></tr>
<tr>
<td>read/1
<td>12765.49</tr>
<tr>
<td>read/16
<td>31604.86</tr>
<tr>
<td>write/1
<td>14357.65</tr>
<tr>
<td>write/16
<td>32447.07</tr>
<tr>
<td>
<td ></tr>
<tr>
<td>
<td><strong>raid0 2 disks</strong></tr>
<tr>
<td>read/1
<td>12046.12</tr>
<tr>
<td>read/16
<td>57410.58</tr>
<tr>
<td>write/1
<td>12993.91</tr>
<tr>
<td>write/16
<td>43023.12</tr>
<tr>
<td>
<td ></tr>
<tr>
<td>
<td><strong>raid1 2 disks</strong></tr>
<tr>
<td>read/1
<td>11484.17</tr>
<tr>
<td>read/16
<td>51084.02</tr>
<tr>
<td>write/1
<td>9821.12</tr>
<tr>
<td>write/16
<td>15220.57</tr>
<tr>
<td>
<td ></tr>
<tr>
<td>
<td><strong>raid1 over raid0 4 disks</strong></tr>
<tr>
<td>read/1
<td>10227.13</tr>
<tr>
<td>read/16
<td>61392.25</tr>
<tr>
<td>write/1
<td>7395.75</tr>
<tr>
<td>write/16
<td>13536.86</tr>
<tr>
<td>
<td><strong>raid0 over raid1 4 disks</strong></tr>
<tr>
<td>read/1
<td>10810.08</tr>
<tr>
<td>read/16
<td>66316.29</tr>
<tr>
<td>write/1
<td>8830.49</tr>
<tr>
<td>write/16
<td>18687.97</tr>
<tr>
<td>
<td><strong>raid10 n2</strong></tr>
<tr>
<td>read/1
<td>11612.89</tr>
<tr>
<td>read/16
<td>99170.51</tr>
<tr>
<td>write/1
<td>10634.62</tr>
<tr>
<td>write/16
<td>31038.5</tr>
</table>
<p>Script for reference:</p>
<div><span><a href="http://www.mysqlperformanceblog.com">PLAIN TEXT</a></span></div>
<div><span>CODE:</span>
<div>
<div>
<ol>
<li>
<div>#!/bin/sh</div>
</li>
<li>
<div>set -u</div>
</li>
<li>
<div>set -x</div>
</li>
<li>
<div>set -e</div>
</li>
<li>
<div>&nbsp;</div>
</li>
<li>
<div>for size in 50G; do</div>
</li>
<li>
<div>&nbsp; &nbsp;for mode in rndrd rndwr; do</div>
</li>
<li>
<div>&nbsp; &nbsp;#for mode in rndwr; do</div>
</li>
<li>
<div>&nbsp; &nbsp;#for blksize in <span>512</span> <span>4096</span> <span>8192</span> <span>16384</span> <span>32768</span> <span>65536</span>&nbsp; ; do</div>
</li>
<li>
<div>&nbsp; &nbsp;for blksize in <span>16384</span> ; do</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; sysbench --test=fileio --file-num=<span>64</span> --file-total-size=$size prepare</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; #for threads in <span>1</span> <span>4</span> <span>8</span>; do</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; for threads in <span>1</span> <span>16</span> ; do</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;echo <span>"====== testing $blksize in $threads threads"</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;echo PARAMS $size $mode $threads $blksize&gt; sysbench-size-$size-mode-$mode-threads-$threads-blksz-$blksize</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;for i in <span>1</span> <span>2</span> <span>3</span> ; do</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;sysbench --test=fileio --file-total-size=$size --file-test-mode=$mode\</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; --max-time=<span>180</span> --max-requests=<span>100000000</span> --num-threads=$threads --init-rng=on \</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; --file-num=<span>64</span> --file-extra-flags=direct --file-fsync-freq=<span>0</span> --file-block-size=$blksize run \</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | tee -a sysbench-size-$size-mode-$mode-threads-$threads-blksz-$blksize <span>2</span>&gt;&amp;<span>1</span></div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;done</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; done</div>
</li>
<li>
<div>&nbsp; &nbsp; &nbsp; sysbench --test=fileio --file-total-size=$size cleanup</div>
</li>
<li>
<div>&nbsp; &nbsp;done</div>
</li>
<li>
<div>&nbsp; &nbsp;done</div>
</li>
<li>
<div>done </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Vadim |
      <a href="http://www.mysqlperformanceblog.com/2010/03/24/raid-throughput-on-fusionio/#comments">4 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2010/03/24/raid-throughput-on-fusionio/&amp;title=RAID%20throughput%20on%20FusionIO" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2010/03/24/raid-throughput-on-fusionio/&amp;title=RAID%20throughput%20on%20FusionIO" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2010/03/24/raid-throughput-on-fusionio/&amp;title=RAID%20throughput%20on%20FusionIO" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2010/03/24/raid-throughput-on-fusionio/&amp;T=RAID%20throughput%20on%20FusionIO" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2010/03/24/raid-throughput-on-fusionio/&amp;title=RAID%20throughput%20on%20FusionIO" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24024&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24024&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:22:"MySQL Performance Blog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:42;a:6:{s:4:"data";s:88:"
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:36:"Berkeley DB now supports SQL (again)";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:43:"http://www.lenzg.net/archives/295-guid.html";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:73:"http://www.lenzg.net/archives/295-Berkeley-DB-now-supports-SQL-again.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2066:"Berkeley DB (BDB) is undoubtedly the workhorse among the opensource embedded database engines. It started as a university project in the mid-eighties and was further developed by Sleepycat Software, until it got acquired by Oracle in February 2006.
I had the impression that BDB had lost a lot of its popularity among opensource developers to SQLite in recent times, which has evolved into becoming the default choice for developers looking for an embedded data store. I'd assume primarily because the code is not released under any particular license, but put in the public domain (which makes it very attractive for embedding it into one's code), and also because it's lightweight, supports SQL and has interfaces to a number of languages.
Of course, SQLite has its limitations and use cases (as every product), so it may not be suited for some particular application. As the SQLite developers put it: "SQLite is not designed to replace Oracle. It is designed to replace fopen().".
Yesterday, Oracle announced a new version of BDB. One of the notable features of this release is the introduction of a new SQL API, based on SQLite. According to Gregory Burd, Product Manager for Berkeley DB at Oracle, they did so by including a version of SQLite which uses Berkeley DB for storage (replacing btree.c). I think this is a very smart move &ndash; instead of introducing a new API, developers can now easily switch to a different storage backend in case they are experiencing issues with the default SQLite implementation. So now MySQL isn't the only database with different storage backends anymore 
I am curious to learn more about how the BDB implementation compares against the original (both feature- and performance-wise).
Oh, and this is actually not the first time someone put an SQL interface in front of Berkeley DB &ndash; BDB was the first transaction-safe storage engine that provided page-level locking for MySQL in version 3.23.15 (released in May 2000). The InnoDB storage engine was added some time afterwards (MySQL 3.23.34a, released in March 2001).";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 24 Mar 2010 21:17:26 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:10:{i:0;a:5:{s:4:"data";s:5:"Linux";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:3:"OSS";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:3:"bdb";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:9:"databases";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:5;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:6;a:5:{s:4:"data";s:6:"oracle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:7;a:5:{s:4:"data";s:3:"oss";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:8;a:5:{s:4:"data";s:6:"sqlite";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:9;a:5:{s:4:"data";s:7:"storage";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2988:"<p><a href="http://www.oracle.com/technology/products/berkeley-db/">Berkeley DB</a> (BDB) is undoubtedly the <a href="http://en.wikipedia.org/wiki/Berkeley_DB#Programs_that_use_Berkeley_DB">workhorse</a> among the opensource embedded database engines. It started as a university project in the mid-eighties and was further developed by <a href="http://en.wikipedia.org/wiki/Sleepycat">Sleepycat Software</a>, until it got acquired by Oracle in February 2006.</p>
<p>I had the impression that BDB had lost a lot of its popularity among opensource developers to <a href="http://www.sqlite.org/">SQLite</a> in recent times, which has evolved into becoming the default choice for developers looking for an embedded data store. I'd assume primarily because the code is not released under any particular license, but put in the <a href="http://www.sqlite.org/copyright.html">public domain</a> (which makes it very attractive for embedding it into one's code), and also because it's lightweight, supports SQL and has interfaces to a number of languages.</p>
<p>Of course, SQLite has its <a href="http://www.sqlite.org/limits.html">limitations</a> and <a href="http://sqlite.org/whentouse.html">use cases</a> (as every product), so it may not be suited for some particular application. As the SQLite developers put it: "SQLite is not designed to replace Oracle. It is designed to replace fopen().".</p>
<p>Yesterday, Oracle <a href="http://www.oracle.com/us/corporate/press/063695">announced</a> a new version of BDB. One of the notable features of this release is the introduction of a new SQL API, based on SQLite. <a href="http://twitter.com/gregburd/statuses/10979336891">According to Gregory Burd</a>, Product Manager for Berkeley DB at Oracle, they did so by including a version of SQLite which uses Berkeley DB for storage (replacing btree.c). I think this is a very smart move &ndash; instead of introducing a new API, developers can now easily switch to a different storage backend in case they are experiencing issues with the default SQLite implementation. So now MySQL isn't the only database with different storage backends anymore <img src="http://www.lenzg.net/templates/default/img/emoticons/smile.png" alt=":-)" style="display: inline; vertical-align: bottom;" class="emoticon" /></p>
<p>I am curious to learn more about how the BDB implementation compares against the original (both feature- and performance-wise).</p>
<p>Oh, and this is actually not the first time someone put an SQL interface in front of Berkeley DB &ndash; BDB was the first transaction-safe storage engine that provided page-level locking for MySQL in version 3.23.15 (released in May 2000). The InnoDB storage engine was added some time afterwards (MySQL 3.23.34a, released in March 2001).</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24023&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24023&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:12:"Lenz Grimmer";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:43;a:6:{s:4:"data";s:43:"
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:14:"xtrabackup-1.1";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:43:"http://www.mysqlperformanceblog.com/?p=2370";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:62:"http://www.mysqlperformanceblog.com/2010/03/24/xtrabackup-1-1/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1429:"Dear Community,
It is time to announce the next version of backup software from Percona - XtraBackup 1.1.
The list of changes in version 1.1 includes:
Changelog:

XtraBackup is built on a base of MySQL 5.1.44 with InnoDB plugin 1.0.6
Added --host option
tar4ibd can treat over 64GB file
tar4ibd is default method for stream, even tar is specified
the binary supports compressed tables and Baraccuda format

Fixed bugs:

Bug #529874: innobackupex cannot treat exit code of xtrabackup executable
Bug #498660: xtrabackup not handling barracuda compressed table format
Bug #498660: innobackupex doesn't pass --defaults-files to mysql child proc
Bug #510960: innobackupex --remote-host scp doesn't copy MyISAM files

The binary packages for RHEL4,5, Debian, FreeBSD, Windows, Mac OS as well as source code of the XtraBackup is available on http://www.percona.com/percona-builds/XtraBackup/XtraBackup-1.1/.
Debian and RPM are available in Percona repository.
The project lives on Launchpad : https://launchpad.net/percona-xtrabackup and you can report bug to Launchpad bug system:
https://launchpad.net/percona-xtrabackup/+filebug. The documentation is available on our Wiki.
For general questions use our Pecona-discussions group, and for development question Percona-dev group.
For support, commercial and sponsorship inquiries contact Percona.
    
    Entry posted by Aleksandr Kuzminsky |
      5 comments
    Add to:  |  |  |  | ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 24 Mar 2010 21:04:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:1:{i:0;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:4265:"<p>Dear Community,</p>
<p>It is time to announce the next version of backup software from Percona - XtraBackup 1.1.</p>
<p>The list of changes in version 1.1 includes:<br />
<strong>Changelog:</strong></p>
<ul>
<li>XtraBackup is built on a base of MySQL 5.1.44 with InnoDB plugin 1.0.6</li>
<li>Added --host option</li>
<li>tar4ibd can treat over 64GB file</li>
<li>tar4ibd is default method for stream, even tar is specified</li>
<li>the binary supports compressed tables and Baraccuda format</li>
</ul>
<p><strong>Fixed bugs:</strong></p>
<ul>
<li><a href="https://bugs.launchpad.net/percona-xtrabackup/+bug/529874">Bug #529874: innobackupex cannot treat exit code of xtrabackup executable</a></li>
<li><a href="https://bugs.launchpad.net/percona-xtrabackup/+bug/498660">Bug #498660: xtrabackup not handling barracuda compressed table format</a></li>
<li><a href="https://bugs.launchpad.net/percona-xtrabackup/+bug/498660">Bug #498660: innobackupex doesn't pass --defaults-files to mysql child proc</a></li>
<li><a href="https://bugs.launchpad.net/percona-xtrabackup/+bug/510960">Bug #510960: innobackupex --remote-host scp doesn't copy MyISAM files</a></li>
</ul>
<p>The binary packages for RHEL4,5, Debian, FreeBSD, Windows, Mac OS as well as source code of the XtraBackup is available on <a href="http://www.percona.com/percona-builds/XtraBackup/XtraBackup-1.1/">http://www.percona.com/percona-builds/XtraBackup/XtraBackup-1.1/</a>.</p>
<p>Debian and RPM are available in <a href="http://www.percona.com/docs/wiki/release%3Astart#percona_apt_repository">Percona repository</a>.</p>
<p>The project lives on Launchpad : <a href="https://launchpad.net/percona-xtrabackup">https://launchpad.net/percona-xtrabackup</a> and you can report bug to Launchpad bug system:<br />
<a href="https://launchpad.net/percona-xtrabackup/+filebug">https://launchpad.net/percona-xtrabackup/+filebug</a>. The documentation is available on <a href="http://www.percona.com/docs/wiki/percona-xtrabackup%3Astart">our Wiki</a>.</p>
<p>For general questions use our <a href="http://groups.google.com/group/percona-discussion">Pecona-discussions</a> group, and for development question <a href="http://groups.google.com/group/percona-dev">Percona-dev</a> group.</p>
<p>For support, commercial and sponsorship inquiries contact <a href="http://www.percona.com/contacts.html">Percona</a>.</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Aleksandr Kuzminsky |
      <a href="http://www.mysqlperformanceblog.com/2010/03/24/xtrabackup-1-1/#comments">5 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2010/03/24/xtrabackup-1-1/&amp;title=xtrabackup-1.1" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2010/03/24/xtrabackup-1-1/&amp;title=xtrabackup-1.1" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2010/03/24/xtrabackup-1-1/&amp;title=xtrabackup-1.1" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2010/03/24/xtrabackup-1-1/&amp;T=xtrabackup-1.1" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2010/03/24/xtrabackup-1-1/&amp;title=xtrabackup-1.1" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24021&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24021&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:22:"MySQL Performance Blog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:44;a:6:{s:4:"data";s:53:"
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:35:"Rendering Trees with Closure Tables";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:70:"tag:blogger.com,1999:blog-5445766604096569596.post-6402470695723624702";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:75:"http://karwin.blogspot.com/2010/03/rendering-trees-with-closure-tables.html";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:392:"I got a comment from a reader about the Naive Trees section of my presentation SQL Antipatterns Strike Back.  I've given this presentation at the MySQL Conference &amp; Expo in the past.I'd also like to mention that I've developed these ideas into a new book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.  The book is now available in Beta and for pre-order from Pragmatic";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 24 Mar 2010 20:55:00 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:3:"sql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"mysql";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"trees";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:605:"I got a comment from a reader about the Naive Trees section of my presentation SQL Antipatterns Strike Back.  I've given this presentation at the MySQL Conference &amp; Expo in the past.I'd also like to mention that I've developed these ideas into a new book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.  The book is now available in Beta and for pre-order from Pragmatic<br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24022&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24022&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Bill Karwin";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:45;a:6:{s:4:"data";s:53:"
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:38:"MySQL Cluster 7.1.2a binaries released";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:32:"http://www.clusterdb.com/?p=1004";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:78:"http://www.clusterdb.com/mysql-cluster/mysql-cluster-7-1-2a-binaries-released/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:515:"The binary version for MySQL Cluster 7.1.2a has now been made available at http://dev.mysql.com/downloads/cluster/ under the Development tab.
Note that this beta load contains the latest NDBINFO and MySQL Cluster Connector for Java (ClusterJ) enhancements &#8211; please try them out and provide feedback (any bugs should be reported through bugs.mysql.com.
A description of all of the changes (fixes) that have gone into MySQL Cluster 7.1.2a (compared to 7.1.1) can be found in the MySQL Cluster 7.1.2a Change Log.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 24 Mar 2010 19:31:46 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:3:{i:0;a:5:{s:4:"data";s:13:"MySQL Cluster";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:17:"MySQL Cluster 7.1";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1291:"<p><a href="http://www.clusterdb.com/wp-content/uploads/2010/02/mysql-cluster-logo-150x105.png"><img class="alignright size-full  wp-image-919" title="mysql-cluster-logo-150x105" src="http://www.clusterdb.com/wp-content/uploads/2010/02/mysql-cluster-logo-150x105.png" alt="" width="150" height="105" /></a>The binary version for MySQL Cluster 7.1.2a has now been made available at <a href="http://dev.mysql.com/downloads/cluster/" target="_blank">http://dev.mysql.com/downloads/cluster/</a> under the Development tab.</p>
<p>Note that this beta load contains the latest NDBINFO and MySQL Cluster Connector for Java (ClusterJ) enhancements &#8211; please try them out and provide feedback (any bugs should be reported through <a href="http://bugs.mysql.com" target="_blank">bugs.mysql.com</a>.</p>
<p>A description of all of the changes (fixes) that have gone into MySQL Cluster 7.1.2a (compared to 7.1.1) can be found in the <a href="http://www.clusterdb.com/wp-content/uploads/2010/03/MySQL_Cluster_7_1_2a_ChangeLog.txt" target="_blank">MySQL Cluster 7.1.2a Change Log</a>.</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24020&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24020&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:13:"Andrew Morgan";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:46;a:6:{s:4:"data";s:43:"
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:22:"LOAD DATA and recovery";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:38:"http://thenoyes.com/littlenoise/?p=104";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:38:"http://thenoyes.com/littlenoise/?p=104";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1748:"A little two-part quiz. If you get the first one without peeking, you're worth your pay as a DBA. If you get the second one without peeking, you may tell your boss that some random guy on the Internet says you deserve a raise.
Start with a text file, 'test.txt', with these three lines:
1
1
2
Set up the test in MySQL:
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (id int primary key);
LOAD DATA INFILE 'test.txt' INTO TABLE t1;

This gives "ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'", which is expected. 
What's in the table? 
Show Answer &#9660;

It depends. If the engine is MyISAM, then you'll have one row: the first '1' from the file was inserted, everything else was skipped. If the engine is InnoDB, you'll have no rows, because the transaction would rollback. So either 1 row or 0 rows.

Now, pretend you're setting up a slave, or there was a crash and you're recovering from binary logs:
mysqlbinlog bin.000001 | mysql
How many rows are in t1 now?
Show Answer &#9660;


mysql> SELECT * FROM t1;
+----+
| id |
+----+
|  1 |
|  2 |
+----+
2 rows in set (0.00 sec)
Why? The manual says, "mysqlbinlog converts LOAD DATA INFILE statements to LOAD DATA LOCAL INFILE statements," and, "with LOCAL, the default duplicate-key handling behavior is the same as if IGNORE is specified."

http://dev.mysql.com/doc/refman/5.1/en/mysqlbinlog.html
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Note that a replicating slave will handle it correctly - if the master used LOCAL (and therefore IGNORE), the slave will do IGNORE. If the master did not use LOCAL or IGNORE and so got the error above, the slave will do the same, and so the data will match. So be advised: replication and mysqlbinlog | mysql may not give the same results.
";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 24 Mar 2010 17:39:32 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:1:{i:0;a:5:{s:4:"data";s:13:"MySQL Gotchas";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2483:"<p>A little two-part quiz. If you get the first one without peeking, you're worth your pay as a DBA. If you get the second one without peeking, you may tell your boss that some random guy on the Internet says you deserve a raise.</p>
<p>Start with a text file, 'test.txt', with these three lines:</p>
<p><code>1<br />
1<br />
2</code></p>
<p>Set up the test in MySQL:</p>
<p><code>DROP TABLE IF EXISTS t1;<br />
CREATE TABLE t1 (id int primary key);<br />
LOAD DATA INFILE 'test.txt' INTO TABLE t1;</code></p>
<p><span></span></p>
<p>This gives "<code>ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'</code>", which is expected. </p>
<p>What's in the table? </p>
<p><a href="javascript:void(null);">Show Answer &#9660;</a></p>
<div>
It depends. If the engine is MyISAM, then you'll have one row: the first '1' from the file was inserted, everything else was skipped. If the engine is InnoDB, you'll have no rows, because the transaction would rollback. So either 1 row or 0 rows.
</div>
<p>Now, pretend you're setting up a slave, or there was a crash and you're recovering from binary logs:</p>
<p><code>mysqlbinlog bin.000001 | mysql</code></p>
<p>How many rows are in t1 now?</p>
<p><a href="javascript:void(null);">Show Answer &#9660;</a></p>
<div>
<code><br />
mysql> SELECT * FROM t1;<br />
+----+<br />
| id |<br />
+----+<br />
|  1 |<br />
|  2 |<br />
+----+<br />
2 rows in set (0.00 sec)</code></p>
<p>Why? The manual says, "mysqlbinlog converts LOAD DATA INFILE statements to LOAD DATA LOCAL INFILE statements," and, "with LOCAL, the default duplicate-key handling behavior is the same as if IGNORE is specified."<br />
<a href="http://dev.mysql.com/doc/refman/5.1/en/mysqlbinlog.html"></p>
<p>http://dev.mysql.com/doc/refman/5.1/en/mysqlbinlog.html</a></p>
<p><a href="http://dev.mysql.com/doc/refman/5.1/en/load-data.html">http://dev.mysql.com/doc/refman/5.1/en/load-data.html</a></p>
<p>Note that a replicating slave will handle it correctly - if the master used LOCAL (and therefore IGNORE), the slave will do IGNORE. If the master did not use LOCAL or IGNORE and so got the error above, the slave will do the same, and so the data will match. So be advised: replication and <code>mysqlbinlog | mysql</code> may not give the same results.
</div><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24019&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24019&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Scott Noyes";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:47;a:6:{s:4:"data";s:58:"
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:29:"How to find MySQL developers?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:38:"http://ronaldbradford.com/blog/?p=2654";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:71:"http://ronaldbradford.com/blog/how-to-find-mysql-developers-2010-03-24/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:1197:"Brian wrote recently Where did all of the MySQL Developers Go?, while over in Drizzle land they have been accepted for the Google Summer of code along with many other open source projects.  MySQL from my observation a noticeable absentee. 
Historically, the lack of opportunity to enable community contributions and see them implemented in say under 5 years, has really hurt MySQL in recent times. There is plenty of history here so that&#8217;s not worth repeating. The current landscape of patches, forks and custom MySQL binaries for storage engine provider has provided a boom of innovation that sadly is now lost from the core MySQL product.
In Drizzle, community contribution is actively sought and a good portion of committed code is not from the core Drizzle developers (wherever they work). As a Drizzle GSoC project contributor last year Padraig for example this year is helping to mentor. The Drizzle project contribution philisophy, GSoC  and other activities such as the Drizzle Developer Day all enable the next generation of developers to be part of ongoing project developement.
Oracle, what are you going to do to foster an active community and new long term developers for MySQL?";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 24 Mar 2010 17:07:42 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:4:{i:0;a:5:{s:4:"data";s:9:"Databases";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:7:"Drizzle";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:5:"MySQL";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:12:"Professional";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:1609:"<p>Brian wrote recently <a href="http://krow.livejournal.com/687521.html">Where did all of the MySQL Developers Go?</a>, while over in <a href="http://drizzle.org">Drizzle</a> land they have been accepted for the <a href="http://drizzle.org/wiki/Soc">Google Summer of code</a> along with many other open source projects.  MySQL from my observation a noticeable absentee. </p>
<p>Historically, the lack of opportunity to enable community contributions and see them implemented in say under 5 years, has really hurt MySQL in recent times. There is plenty of history here so that&#8217;s not worth repeating. The current landscape of patches, forks and custom MySQL binaries for storage engine provider has provided a boom of innovation that sadly is now lost from the core MySQL product.</p>
<p>In Drizzle, community contribution is actively sought and a good portion of committed code is not from the core Drizzle developers (wherever they work). As a Drizzle GSoC project contributor last year <a href="http://posulliv.github.com/">Padraig</a> for example this year is helping to mentor. The Drizzle project contribution philisophy, GSoC  and other activities such as the Drizzle Developer Day all enable the next generation of developers to be part of ongoing project developement.</p>
<p>Oracle, what are you going to do to foster an active community and new long term developers for MySQL?</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24016&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24016&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:15:"Ronald Bradford";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:48;a:6:{s:4:"data";s:43:"
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:20:"Twitter vs. Facebook";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:33:"http://pooteeweet.org/blog/0/1713";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:33:"http://pooteeweet.org/blog/0/1713";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:2087:"Ages ago I created a Twitter account to get some free app. I figured some nobody was following me I didn't have to feel like a guilty spammer. For some odd obsession to honesty probably I did use my proper name and sooner or later people started following me despite me having only put out a single spam message. So on very few occasions I tried out tweeting (still feels weird using that word) since then, obviously I have never used it to get a free app again by spamming. Anyways I have now decided that for small blurps about technical stuff I will from now on use Twitter, thereby sparing my Facebook friends from such gibberish. In turn my developer friends on Facebook that do not care about what I have to say about Frisbee, DJing or politics can start removing me from Facebook. Actually I might just do this myself, because its FUCKING ANNOYING that so many people multi spam their status messages to Twitter and Facebook.

Anyways the choice for how to split things up was obvious seeing that the bulk of my friends on Facebook are not developers and the bulk of my friends/followers on Twitter are developers. I guess this is my way of fishing for some more Twitter followers so that I have a decent audience to communicate with, since right now I still have more developer connections on Facebook than on Twitter. You can expect to see some random PHP, MySQL, Solr etc. tidbits to start appearing on my Twitter account from now on like ZF wtf's or a reminder that we now have a replacement for @fopen hacks in our autoloaders.

By the way, the account name &quot;dybvandal&quot; originates from my QuakeWorld gaming days. My friends actually just got active again playing as Clan Dybbuk. My nick was Vandal inspired by a character in a pen and paper adventure, Shadowrun to be exact. Vandal was the name of a timid elf and I just thought it was a funny combination &quot;timid&quot; and &quot;vandal&quot;. Heh, that website runs on some ancient code, that is probably quite embarrassing if I would read it. Definitely before I started using mod_rewrite to create nice URLs.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 24 Mar 2010 16:10:02 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:1:{i:0;a:5:{s:4:"data";s:7:"general";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:2597:"<p>Ages ago I created <a href="http://twitter.com/dybvandal">a Twitter account</a> to get some free app. I figured some nobody was following me I didn't have to feel like a guilty spammer. For some odd obsession to honesty probably I did use my proper name and sooner or later people started following me despite me having only put out a single spam message. So on very few occasions I tried out tweeting (still feels weird using that word) since then, obviously I have never used it to get a free app again by spamming. Anyways I have now decided that for small blurps about technical stuff I will from now on use Twitter, thereby sparing my Facebook friends from such gibberish. In turn my developer friends on Facebook that do not care about what I have to say about Frisbee, DJing or politics can start removing me from Facebook. Actually I might just do this myself, because its FUCKING ANNOYING that so many people multi spam their status messages to Twitter and Facebook.</p>

<p>Anyways the choice for how to split things up was obvious seeing that the bulk of my friends on Facebook are not developers and the bulk of my friends/followers on Twitter are developers. I guess this is my way of fishing for some more Twitter followers so that I have a decent audience to communicate with, since right now I still have more developer connections on Facebook than on Twitter. You can expect to see some random PHP, MySQL, Solr etc. tidbits to start appearing on my Twitter account from now on like <a href="http://twitter.com/dybvandal/status/10924705700">ZF wtf's</a> or a reminder that we now have <a href="http://twitter.com/dybvandal/status/10930987809">a replacement for @fopen hacks in our autoloaders</a>.</p>

<p>By the way, the account name &quot;dybvandal&quot; originates from my QuakeWorld gaming days. My friends actually just got active again playing as <a href="http://dybbuk.de/">Clan Dybbuk</a>. My nick was <a href="http://dybbuk.de/index.php?ext_language=de&amp;ext_page=vandal">Vandal</a> inspired by a character in a pen and paper adventure, Shadowrun to be exact. Vandal was the name of a timid elf and I just thought it was a funny combination &quot;timid&quot; and &quot;vandal&quot;. Heh, that website runs on some ancient code, that is probably quite embarrassing if I would read it. Definitely before I started using mod_rewrite to create nice URLs.</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24015&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24015&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Lukas Smith";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}i:49;a:6:{s:4:"data";s:63:"
    
    
    
    
    
    
    
    
    
    
    
    
  ";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";s:5:"child";a:3:{s:0:"";a:6:{s:5:"title";a:1:{i:0;a:5:{s:4:"data";s:73:"My thoughts on Ada Lovelace Day, A candid conversation with Sheeri Cabral";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"guid";a:1:{i:0;a:5:{s:4:"data";s:36:"http://www.pythian.com/news/?p=10385";s:7:"attribs";a:1:{s:0:"";a:1:{s:11:"isPermaLink";s:5:"false";}}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:4:"link";a:1:{i:0;a:5:{s:4:"data";s:66:"http://www.pythian.com/news/10385/my-thoughts-on-ada-lovelace-day/";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:11:"description";a:1:{i:0;a:5:{s:4:"data";s:8784:"I had an interesting conversation with Sheeri yesterday. She had pointed out that today was Ada Lovelace Day, a day devoted to highlight and thank the many women in the Information Technology industry for their contributions. She suggested that if I wanted to blog about it she would find that appropriate, given what we&#8217;ve achieved here at Pythian.
First, I consider that a huge compliment. And then, a distant second, I told Sheeri &#8211; no I don&#8217;t think I&#8217;ll blog about it, that&#8217;s not my thing.
This is the IM conversation that came out of that email exchange when Sheeri and I connected about an hour later. You may or may not find it interesting, but ultimately I thought it was interesting enough to share.
tl;dr: Happy Ada Lovelace Day.
expanded version:
Paul Vallee:
hey sheeri!
Sheeri K. Cabral:
heya!
Paul Vallee:
so a couple quick notes about ada lovelace then i&#8217;ll drop it ok :)
i want to keep the blog as much as possible as a personal voice
that means you get to maintain your voice
and i, mine
i want to avoid overusing it as &#8220;corporate message speak&#8221;
Sheeri K. Cabral:
that&#8217;s fair, the day is to draw attention to the achievements of women in tech.
Paul Vallee:
we have the news release section of the site for that
Sheeri K. Cabral:
yeah, I don&#8217;t quite think this is news-release-worthy
Paul Vallee:
personally, it makes me intensely queasy to single out any group
the reason it&#8217;s comfortable here is because your sex DOESN&#8217;T matter
i don&#8217;t want to post on any [insert minority here] IT day
I&#8217;m done with all that
my contribution is to make a place where diversity and tolerance are cultural imperatives of the first order
Sheeri K. Cabral:
so then post that :)
because you have made that place at Pythian
Paul Vallee:
and the goal of that is to &#8230; make it appropriately irrelevant as a subject for discussion or highlighting
without listing how much of a % of the company is female, etc.
sheeri, it&#8217;s just not my subject, it&#8217;s not my voice, for my blog
i won&#8217;t be posting about it :)
i hope you&#8217;ll forgive me
Sheeri K. Cabral:
I&#8217;ll forgive you.
Paul Vallee:
the closest i&#8217;ve ever come was on carl sagan&#8217;s anniversary of his death
Sheeri K. Cabral:
I just hope that your kids and partner don&#8217;t feel the same about Father&#8217;s Day
because you don&#8217;t need Father&#8217;s Day
you&#8217;re a great father every day, and your kids and partner should appreciate you every day
Paul Vallee:
http://www.pythian.com/news/341/the-fine-art-of-baloney-detection-in-honour-of-dr-carl-sagan/
Sheeri K. Cabral:
but sometimes it&#8217;s nice to have a special day.
Paul Vallee:
sheeri :)
Sheeri K. Cabral:
like when someone brings in cookies because your mother gave birth to you.
Paul Vallee:
that&#8217;s not forgiving me heh
Sheeri K. Cabral:
I&#8217;m giving you my perspective
there&#8217;s nothing to forgive, because you aren&#8217;t doing anything wrong.
Paul Vallee:
i struggle with this, i do
but ultimately, i rely on this principle
when folks wanted you to go public with your opinion on the mysql merger
you decided to do it with no pressure from me to write or not either way
and i didn&#8217;t interfere with your opinion one way or another
and as a result, it came out as pure you, in your voice
that&#8217;s how the blog should be
for all of my failings, i can&#8217;t go there
Sheeri K. Cabral:
I understand.  I made a suggestion and you said &#8220;nope, not my thing&#8221;
Sheeri K. Cabral:
that&#8217;s why there&#8217;s nothing to forgive
Paul Vallee:
but that&#8217;s not me
Sheeri K. Cabral:
look, I struggle with this because I&#8217;m a success regardless of my gender.
Paul Vallee:
EXACTLY
Sheeri K. Cabral:
and if I was bad at my job it would also be regardless of my gender.
Paul Vallee:
EXACTLY
so don&#8217;t you think this day is misguided, somehow?
Sheeri K. Cabral:
it&#8217;s like affirmative action.
well, almost like it
Paul Vallee:
i mean it&#8217;s not like secretary&#8217;s day
it&#8217;s about &#8220;men who are secretaries&#8221; day
that&#8217;s just &#8230; not going to help
Sheeri K. Cabral:
it&#8217;s giving a boost to a minority, because they come from a disadvantaged place.
Paul Vallee:
yeah that&#8217;s right, it&#8217;s not like father&#8217;s day at all is it
Sheeri K. Cabral:
well
here&#8217;s the thing
the day itself is to blog to draw attention to women in tech and science
so in that way, the actions you take are a celebration
Paul Vallee:
but don&#8217;t you see the condescension that i see?
Sheeri K. Cabral:
in the day itself?  I see how one can read condescension there.
but you could do a great blog post that says &#8220;screw this gender crap.  Women at Pythian get paid the same and treated the same as men.  Period.&#8221;
Paul Vallee:
LOL
Sheeri K. Cabral:
Here&#8217;s the thing &#8212; the message is &#8220;women have a hard time in IT&#8221;
the very message of the day is that
and I hate people asking me what it&#8217;s like to be a woman in IT, because it&#8217;s a stupid question to me.
what&#8217;s it like being a man in IT?
I can type without my boobs getting in the way.  That&#8217;s the only thing I can think of to say.
Paul Vallee:
LOL
ok
here&#8217;s my point of view
i think the question &#8220;what&#8217;s it like to be a woman in IT&#8221; is itself problematic
we can&#8217;t move on from this until we make it disappear
canadians and americans have a different approach to racism/sexism/etc. by the way
also to religion
here, it is socially unacceptable to voice a sexist or racist thought or comment, or to single people out specifically. there, it comes up a lot.
i&#8217;m not sure why that is, but it goes really deep
like, i know our prime minister is a religious man, it&#8217;s not a secret
but you&#8217;ll never, ever, ever hear him refer to his religion publically
these are things that are in the private sphere
we force them to disappear from our process specifically by not mentioning them. that is how we make everybody comfortable, for the most part.
if the PM were to mention his religion, he would by the very statement make others feel excluded or certain people singled out for special treatment or affection
Sheeri K. Cabral:
or it becomes the elephant in the room.
Paul Vallee:
yeah, maybe so, maybe so. who am i to say which approach is most successful.
Sheeri K. Cabral:
there&#8217;s a difference between &#8220;god wants me to rule this way&#8221; and saying &#8220;I went to church yesterday&#8221; though.
Paul Vallee:
India is trying something new politically directly related to this by the way. they are assigning a third of the seats in their congress (I think) to women.
Sheeri K. Cabral:
yeah, I saw that.
Paul Vallee:
I wonder why a third (it should be half for purely technical biological reasons)
I think if they made it half, I would be OK with it.
Sheeri K. Cabral:
and to be more of a tangent&#8230;..my question is also &#8220;how many women is equal&#8221;?  Honestly, I don&#8217;t think women *in general* like to geek out as much.  *in general* we&#8217;re not as competitive.   we don&#8217;t care about a lot of the stuff that drives geeks (females and males) to success&#8230;again, in general.  so one can argue that 10% isn&#8217;t &#8220;enough&#8221; but I don&#8217;t think the &#8220;right&#8221; gender balance is 50/50 anyway.
Paul Vallee:
Well, I think women and men have different thought processes, sure
but some of that would tilt the scales towards women as DBAs, you know
women rate better for recovering from sleep inertia for instance
Sheeri K. Cabral:
just the other day I realized that when playing games, especially video games, I like games you can actually win, not merely competitive stuff.  ie, you don&#8217;t win or lose at Tetris, or Ms. Pac Man, you just keep going.
card games, for instance, you win or lose at.
Paul Vallee:
True &#8211; tie this back in for me?
Sheeri K. Cabral:
and indeed I wondered if there was a gender tie-in there.
so &#8220;geeking&#8221; out &#8211; -finding the fastest disk speed, what&#8217;s the best filesystem for an ssd drive, etc.
that is similar to &#8220;male posturing&#8221; which is more of a guy thing.  women (in general) are more apt to say &#8220;is it good enough?&#8221;  not &#8220;is it the best!&#8221;  because we don&#8217;t want to rathole forever.
(I guess my point is I think there are strengths and weaknesses each gender brings to the table.)
but overall
if that blog post isn&#8217;t your cup of tea
then that&#8217;s fine.
:)
Paul Vallee:
I&#8217;m thinking of posting this IM transcript :)
&#8230; and so, with the lightest of edits and with Sheeri&#8217;s permission, I have! Happy Ada Lovelace Day, everyone.";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:7:"pubDate";a:1:{i:0;a:5:{s:4:"data";s:31:"Wed, 24 Mar 2010 15:18:52 +0000";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}s:8:"category";a:5:{i:0;a:5:{s:4:"data";s:17:"Non-Tech Articles";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:1;a:5:{s:4:"data";s:15:"Not on Homepage";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:2;a:5:{s:4:"data";s:14:"Technical Blog";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:3;a:5:{s:4:"data";s:16:"ada lovelace day";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}i:4;a:5:{s:4:"data";s:5:"women";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:40:"http://purl.org/rss/1.0/modules/content/";a:1:{s:7:"encoded";a:1:{i:0;a:5:{s:4:"data";s:10075:"<p>I had an interesting conversation with Sheeri yesterday. She had pointed out that today was <a href="http://findingada.com" />Ada Lovelace Day</a>, a day devoted to highlight and thank the many women in the Information Technology industry for their contributions. She suggested that if I wanted to blog about it she would find that appropriate, given what we&#8217;ve achieved here at Pythian.</p>
<p>First, I consider that a huge compliment. And then, a distant second, I told Sheeri &#8211; no I don&#8217;t think I&#8217;ll blog about it, that&#8217;s not my thing.</p>
<p>This is the IM conversation that came out of that email exchange when Sheeri and I connected about an hour later. You may or may not find it interesting, but ultimately I thought it was interesting enough to share.</p>
<p><a href="http://www.urbandictionary.com/define.php?term=tl;dr">tl;dr</a>: Happy Ada Lovelace Day.</p>
<p>expanded version:</p>
<p>Paul Vallee:<br />
hey sheeri!<br />
Sheeri K. Cabral:<br />
heya!<br />
Paul Vallee:<br />
so a couple quick notes about ada lovelace then i&#8217;ll drop it ok :)<br />
i want to keep the blog as much as possible as a personal voice<br />
that means you get to maintain your voice<br />
and i, mine<br />
i want to avoid overusing it as &#8220;corporate message speak&#8221;<br />
Sheeri K. Cabral:<br />
that&#8217;s fair, the day is to draw attention to the achievements of women in tech.<br />
Paul Vallee:<br />
we have the news release section of the site for that<br />
Sheeri K. Cabral:<br />
yeah, I don&#8217;t quite think this is news-release-worthy<br />
Paul Vallee:<br />
personally, it makes me intensely queasy to single out any group<br />
the reason it&#8217;s comfortable here is because your sex DOESN&#8217;T matter<br />
i don&#8217;t want to post on any [insert minority here] IT day<br />
I&#8217;m done with all that<br />
my contribution is to make a place where diversity and tolerance are cultural imperatives of the first order<br />
Sheeri K. Cabral:<br />
so then post that :)<br />
because you have made that place at Pythian<br />
Paul Vallee:<br />
and the goal of that is to &#8230; make it appropriately irrelevant as a subject for discussion or highlighting<br />
without listing how much of a % of the company is female, etc.<br />
sheeri, it&#8217;s just not my subject, it&#8217;s not my voice, for my blog<br />
i won&#8217;t be posting about it :)<br />
i hope you&#8217;ll forgive me<br />
Sheeri K. Cabral:<br />
I&#8217;ll forgive you.<br />
Paul Vallee:<br />
the closest i&#8217;ve ever come was on carl sagan&#8217;s anniversary of his death<br />
Sheeri K. Cabral:<br />
I just hope that your kids and partner don&#8217;t feel the same about Father&#8217;s Day<br />
because you don&#8217;t need Father&#8217;s Day<br />
you&#8217;re a great father every day, and your kids and partner should appreciate you every day<br />
Paul Vallee:<br />
http://www.pythian.com/news/341/the-fine-art-of-baloney-detection-in-honour-of-dr-carl-sagan/<br />
Sheeri K. Cabral:<br />
but sometimes it&#8217;s nice to have a special day.<br />
Paul Vallee:<br />
sheeri :)<br />
Sheeri K. Cabral:<br />
like when someone brings in cookies because your mother gave birth to you.<br />
Paul Vallee:<br />
that&#8217;s not forgiving me heh<br />
Sheeri K. Cabral:<br />
I&#8217;m giving you my perspective<br />
there&#8217;s nothing to forgive, because you aren&#8217;t doing anything wrong.<br />
Paul Vallee:<br />
i struggle with this, i do<br />
but ultimately, i rely on this principle<br />
when folks wanted you to go public with your opinion on the mysql merger<br />
you decided to do it with no pressure from me to write or not either way<br />
and i didn&#8217;t interfere with your opinion one way or another<br />
and as a result, it came out as pure you, in your voice<br />
that&#8217;s how the blog should be<br />
for all of my failings, i can&#8217;t go there<br />
Sheeri K. Cabral:<br />
I understand.  I made a suggestion and you said &#8220;nope, not my thing&#8221;<br />
Sheeri K. Cabral:<br />
that&#8217;s why there&#8217;s nothing to forgive<br />
Paul Vallee:<br />
but that&#8217;s not me<br />
Sheeri K. Cabral:<br />
look, I struggle with this because I&#8217;m a success regardless of my gender.<br />
Paul Vallee:<br />
EXACTLY<br />
Sheeri K. Cabral:<br />
and if I was bad at my job it would also be regardless of my gender.<br />
Paul Vallee:<br />
EXACTLY<br />
so don&#8217;t you think this day is misguided, somehow?<br />
Sheeri K. Cabral:<br />
it&#8217;s like affirmative action.<br />
well, almost like it<br />
Paul Vallee:<br />
i mean it&#8217;s not like secretary&#8217;s day<br />
it&#8217;s about &#8220;men who are secretaries&#8221; day<br />
that&#8217;s just &#8230; not going to help<br />
Sheeri K. Cabral:<br />
it&#8217;s giving a boost to a minority, because they come from a disadvantaged place.<br />
Paul Vallee:<br />
yeah that&#8217;s right, it&#8217;s not like father&#8217;s day at all is it<br />
Sheeri K. Cabral:<br />
well<br />
here&#8217;s the thing<br />
the day itself is to blog to draw attention to women in tech and science<br />
so in that way, the actions you take are a celebration<br />
Paul Vallee:<br />
but don&#8217;t you see the condescension that i see?<br />
Sheeri K. Cabral:<br />
in the day itself?  I see how one can read condescension there.<br />
but you could do a great blog post that says &#8220;screw this gender crap.  Women at Pythian get paid the same and treated the same as men.  Period.&#8221;<br />
Paul Vallee:<br />
LOL<br />
Sheeri K. Cabral:<br />
Here&#8217;s the thing &#8212; the message is &#8220;women have a hard time in IT&#8221;<br />
the very message of the day is that<br />
and I hate people asking me what it&#8217;s like to be a woman in IT, because it&#8217;s a stupid question to me.<br />
what&#8217;s it like being a man in IT?<br />
I can type without my boobs getting in the way.  That&#8217;s the only thing I can think of to say.<br />
Paul Vallee:<br />
LOL<br />
ok<br />
here&#8217;s my point of view<br />
i think the question &#8220;what&#8217;s it like to be a woman in IT&#8221; is itself problematic<br />
we can&#8217;t move on from this until we make it disappear<br />
canadians and americans have a different approach to racism/sexism/etc. by the way<br />
also to religion<br />
here, it is socially unacceptable to voice a sexist or racist thought or comment, or to single people out specifically. there, it comes up a lot.<br />
i&#8217;m not sure why that is, but it goes really deep<br />
like, i know our prime minister is a religious man, it&#8217;s not a secret<br />
but you&#8217;ll never, ever, ever hear him refer to his religion publically<br />
these are things that are in the private sphere<br />
we force them to disappear from our process specifically by not mentioning them. that is how we make everybody comfortable, for the most part.<br />
if the PM were to mention his religion, he would by the very statement make others feel excluded or certain people singled out for special treatment or affection<br />
Sheeri K. Cabral:<br />
or it becomes the elephant in the room.<br />
Paul Vallee:<br />
yeah, maybe so, maybe so. who am i to say which approach is most successful.<br />
Sheeri K. Cabral:<br />
there&#8217;s a difference between &#8220;god wants me to rule this way&#8221; and saying &#8220;I went to church yesterday&#8221; though.<br />
Paul Vallee:<br />
India is trying something new politically directly related to this by the way. they are assigning a third of the seats in their congress (I think) to women.<br />
Sheeri K. Cabral:<br />
yeah, I saw that.<br />
Paul Vallee:<br />
I wonder why a third (it should be half for purely technical biological reasons)<br />
I think if they made it half, I would be OK with it.<br />
Sheeri K. Cabral:<br />
and to be more of a tangent&#8230;..my question is also &#8220;how many women is equal&#8221;?  Honestly, I don&#8217;t think women *in general* like to geek out as much.  *in general* we&#8217;re not as competitive.   we don&#8217;t care about a lot of the stuff that drives geeks (females and males) to success&#8230;again, in general.  so one can argue that 10% isn&#8217;t &#8220;enough&#8221; but I don&#8217;t think the &#8220;right&#8221; gender balance is 50/50 anyway.<br />
Paul Vallee:<br />
Well, I think women and men have different thought processes, sure<br />
but some of that would tilt the scales towards women as DBAs, you know<br />
women rate better for recovering from sleep inertia for instance<br />
Sheeri K. Cabral:<br />
just the other day I realized that when playing games, especially video games, I like games you can actually win, not merely competitive stuff.  ie, you don&#8217;t win or lose at Tetris, or Ms. Pac Man, you just keep going.<br />
card games, for instance, you win or lose at.<br />
Paul Vallee:<br />
True &#8211; tie this back in for me?<br />
Sheeri K. Cabral:<br />
and indeed I wondered if there was a gender tie-in there.<br />
so &#8220;geeking&#8221; out &#8211; -finding the fastest disk speed, what&#8217;s the best filesystem for an ssd drive, etc.<br />
that is similar to &#8220;male posturing&#8221; which is more of a guy thing.  women (in general) are more apt to say &#8220;is it good enough?&#8221;  not &#8220;is it the best!&#8221;  because we don&#8217;t want to rathole forever.<br />
(I guess my point is I think there are strengths and weaknesses each gender brings to the table.)<br />
but overall<br />
if that blog post isn&#8217;t your cup of tea<br />
then that&#8217;s fine.<br />
:)<br />
Paul Vallee:<br />
I&#8217;m thinking of posting this IM transcript :)</p>
<p>&#8230; and so, with the lightest of edits and with Sheeri&#8217;s permission, I have! Happy Ada Lovelace Day, everyone.</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24014&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=24014&vote=-1&apivote=1">Vote DOWN</a>";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}s:32:"http://purl.org/dc/elements/1.1/";a:1:{s:7:"creator";a:1:{i:0;a:5:{s:4:"data";s:11:"Paul Vallee";s:7:"attribs";a:0:{}s:8:"xml_base";s:0:"";s:17:"xml_base_explicit";b:0;s:8:"xml_lang";s:0:"";}}}}}}}}}}}}}}}}s:4:"type";i:128;s:7:"headers";a:7:{s:4:"date";s:29:"Tue, 30 Mar 2010 13:36:14 GMT";s:6:"server";s:22:"Apache/2.2.13 (Fedora)";s:13:"last-modified";s:29:"Tue, 30 Mar 2010 13:35:42 GMT";s:13:"accept-ranges";s:5:"bytes";s:14:"content-length";s:6:"417396";s:10:"connection";s:5:"close";s:12:"content-type";s:8:"text/xml";}s:5:"build";s:14:"20090627192103";}
© 2017 - ZeroByte.ID.
��JFIF�� ( %!1"%)+...383-7(-.+ 0% %----/--/--------/---------------------------------��"��E!1AQaq"��2��#BRSr�b��$CT��34s��0!1AQ�"aq2BR��#$3��?폇� ~�R}�� ~�U�W-��3�)}�� ~�W}�� ~�Uԅ-��^d��?a�>��O�{̟�?��5w��t%Kj�̟��(�� ~�J6F�)�]+�Kj��?w��f�^�5fyi��m��G�vc�·��(�0��O�ϙ(�g΋*r�<#��}�vg��%��P�DUr�Yv'��IC�MK��&�a׎ ɲ��vp_�( �^��j�tC6��R�eIn3�@��R�Uo��p��V�LN��(��2C�[}�9%6XV��õ��sO��>��?`"X�:Y'GA��-1g�Z:\8rr�+�_�e��?`$�#��Ҍ�6@sC��џd�&`l|?�S�_��O�s[dS��j��PS��,�wG��ϡ��O�j��O�j#Z�a�TaOl}��_�O�|?�R��O�j�+�ڽ��'��J_C��ԿCa��qHq^��d��\��R9rUА��)�N9�q\�!��5%y�֝�3u��2�h�S(�e�!��<��^�i`��h��r�vl��G��'(Vɢh2�2s9�i�Z�O9�་k��A�U�N�VK�T|��n=�Rx7��&{��^s�?� �{h�2�ͬ;�f �O�сQk��r\�3>��s��p(�$��9�-��33�&!��S��jmjTV|I� $��gE&��K�i9ـ�sKID��6�*��jU|�pth�F�%��G�ԦXC]!�q �tSڒ'�)u��0=�� 6uP� P��"{��8�Z$�y�s�|�U2�-px��|�"��5]f��R��A��V��,`t�f�?UO�u��D��!\�� d\G5o��'w ݡ�D�C� >��FZx8͢��`�1��:��D-Ez�e�B��sr�O5|fߠ�F+�YO �U�*�B�s�D�^��J��␮+�D�B�㎤�c�=��Ah6��XM��H�$��Sx�ƙ>"/��.S7�|8��[eb�y��I�$ܸFf�c3I��dp݇ҿ�ި�̵"�gww�z#?5?Y��@�cRtG�#�M��.i�0Py6'd��v��҉q�?�>�xp��g��Z�&��n?m��7��vò��0@��l��L G� Y��g��gtn�P.�v��C^�Ug�1��1O�D8�DG�:$9��z�v�Si�} -d��ת�S�Y۽�� ޯZ�7V4�< �IE{%��`�e��: G��:�c�?e��z^��.�wEB��rf��F� iZ��Osm�j<�?��"��jfAl�� p�KA�īt!�=W?�y :�c�7tI�5�3��T��r�;�P��]:Yx.��"��`��V��A�v��ǒ[��㣉w�si�j��o��Si�&s�� "�Jپ�m�R��^i��O�T��W��d�Ǿ�q�)�!rr�*XH�2u�rD�$NV \�$0��Hq.*��Xq?��b��*#� �jRv�$��i2�Z~J��T�;=�p2ٖ��ʋc#��l��{��Ա��u��{I��CqW6{][ �.�H��L��H�Ivv�G��{�qߘh�{��Y��?g� �S��sH��8o.l��Ζ��e�]w>�K�C��,|7�EZ*�+�w�i��s��Sh��Z��jfo�]p�1�wc��שA��X��mF��-��khT�Ss@�L1�&, ֮'�`��?��𸚵k=�34�Is�H.'0�Ú�VǱ�&M��8�J��²�M�\�w�25I�g�u]��Z2�d]��!x�73Ybr��dt7SR��i.m6��'�R~2S+e�nc#ߩ6J1rxA�tGt�բ�w�`�-��sO�U̒\�.;�n �I2N��O�:V�4(r�9}w�KP�ǈ�s�"P��ӗBIJ�®\�!�%])��d�Z��^��5�2��م�i$p:�4%zڧ-�j<6�y|��!U~�`vY�V�f�$�qk��"qR�*��1�>D�Yc��&9#[? �� Q�N�1�})E��kF��Q7'�}1�밙yࡩN�=N�S��Đ�zM�j��q҈�'��#Yxv�qʃ S��6��=U,u75�2� ��봂\n�#��-^ѽ�L��a�Y��;��H>E_.�ܘ��i�c_Ӄ{��Ƽhp�D�U|ish�0'gm�A-=��pA�m�h��j�f�K�|��@�$CĶ|P��jUf��cƖ��5�[)�N�i��s|&2��Tv�vZd��n��f|op�cϺ��,H�2�{Ů�F�3E�L��φ�0ͷM�� i/��R��B�흪r��h�3��.�iw�+nd�Cqx��S�{�$)��Ԟf�U��)�˟��m:Yg8�|�Ψ�!�ɓ�]��=��v[\�b6��M�B��0��yjꏮH��Vq�ɣ �~�TiQ�M?$�+qo\��[/��u��M �U�mr�ms�L�ȑ��*-5�c[[�n�<�.��\�rq�G �E��/��鞪�+�� [5��;�Vqd�n�w\�'ko6$e��g�_�h ��y��JL ui�>��P�.;�b�t�ͯ�1�� HHByn�MwI:��`�[& Gz��o��Y�^��>��a�씹O��J!ZPq[��J��YGHru��.��:��UHy��"�4�)�$z�A"D6��)�}I2��~�e��tn��SV��)��=��R��Q Ǎ�6��*=��d�bip��s��r�E7R$1Őt��i��)۴��y��fih��i7f?5p/&"�dGT�S��$$:��.75�-�fk�V�gq%�H��ܴ.�#d͜Z �7��{I��L�a� t�S�3M��0��^��R��)�psK��k��-��q��ԛ��j�4��f4�n�P�A�7��(� �i2*��s�EɬN!�6��k��Ĵ��<��{�j5k��=?��+��r�8h��奦��c4��w�+�eZ�o�@S:�*�1�L��@�".�¾g5 �v8\JP�x<�l�\�$j��Us�>F�rExO?*�QW1A�h�7ꟴG�Wal��)6��?2�rvmf�ZZ��z�OC�U}�-��L�F��u� ��gF��V1͐Ҥ��ʿ��r��ᑅ ��<��*uo��k1��3�귗3p��X� ��Q��F�lz��G��g�ĩVs��E<[{�?z[��M�/�� o�<�ù;��;M�g�Ϥ�C�X��uy�N�`�#Px�kM��HS�`�*h��t��F�zk��^�t�� ˁn�\G�Ȧ~��f;+x%��U�PT�c>�w�`��8h��!��u.��MZ�`Aޒ�V�� 4\��;c��,�,g��G-T,�Ar]M�X��0Kn,�ܛϚ��`8��b�T��<��ۀ�P��/��n¥�vRe3�m��Pۭ�%��d%�dN;xЛL�ԧ&cԩ �nvEv t��dm��6�o8�$�C�G�5+0m}*aڵ��6�B��h��\"]��A�XWrY�A�Vm��:�s]-�,�|U,�h�đ�Q�- i@҃��m�� +�)w�)��1�o�)��M��nQ�@��U'��pd��T��p*�Z�`'��#+�ږ2/�9Ev��?�eX�M�N��>6B��Q�m鞀�=3J|J,��8�m��p��Q�Q4�y��9rt6�V�'Wgp)v >�Wn�7�O�`�Y�Lo��%��)�4n��j��th��:�v��1��}sO{Ï�8<9s�+�>��Ѻ�T;�i�ڃ�=8U��I(ܛ1�Tt�(z��➲�gF�92�dBzB��Z-�J�rB�Ggm��D��q��V�Dq=�i,{)�;C�Wދ��gR+�|�f��]��h��P��8��50��O�Pݗ�� x(��;��0��l��V�K�\�@P�BuZ�1�U�Ȥ�H�圤�P5��:<&�&-c��u�3�S��]�ڲj��W8wE��䠦�D�#��a�eh&ԥ��v��Q$��W�Q�t�P�*O�y�'��b�n��*�"�sKN��}y�\u�z�z�s��j�%��{�,��f�G |SpB��~I}+Z��KsT2��班?�s]d��D�R��W�څ�G{��<�=A�j�37�>h&� �u��N/̏�)�dSo��Uy |��*�0��Ъu��{Am�F�7M�<1�I�;�7se�Y��h�W�5�4N �Ahi:��+�6ѩ��o�ZJ�� qM�`�N�n>)��h��⇬��gC��! B�CSѬ�JM�^�*D:�6��FG3s墬�J��z�CEj��A7�)��!��uZ�� *G�)�ę�?E-��=pL͡L�:�N��:8��ٶ�~Rgܙ�6_�8��2}N��l��)��"��) �:��2c��&f��]��zu* M��O L�қ\��ۊ�E���%�!�i:64Y�~��=#>��E�C��8u =q]��q�`6f*�_Z�2�`�<T�iw��:��j�Q��i�aͦ��]Q̹�%�N��)��!��\Gﰀ��J��2��z��yL6��O ��z6΢�y��/<��"@~Kq��ęs{��4>" ��e�ׄ/��t��AD(>Z%Wl �[��|�p*�.I��a�'��ǀRSQ)YJ��T�e�G�w �f �QǙ�4^��K3*2oxRl�p5�6��s%U��{[�f�4�M0T�}h��y��g6廾�:�w��~V�A�d��J��_��8�8oPЫ��Ůi;�.�c��L0�&��*��X�c��:�K-V�c��4�|7��4h��w�y�V#�o'� �γ{�@�U��^^}�ẙ��/�PHW$YgFr��`�u�5=�䪾��Wo��c1�s�c��ѯ�S��oN�O*zN^�J>��v?i�d�@79e�uC�aq/�Y��c1h-{&=%3�g��sWi��>A5�;)j=��+�ҽ�j��[��Q��$j� ��[=Z��^��{ΣI��r�$��Z��px�=G��ԉ��^�.�]�cddv}�kl��_'�ꡂ7��u��^;R�!��T��yEF��$��nqؖ0�:g7#P�MP��k��@g�i6�z�!�77�h$L�A�<��t��;��d��2��"�� "La��$X9�@��t�S:��a��U,+��7Q��CÚM��t:AZfa��.q��w��i6ƛq:��H�UnW��e�2)��D �H#�k��Kr �V�5f��}`��:�ո}�o��!��@Ic$*�^�;��+��TtSq�P�E�:�s��C�&w�y� ��J��e��I�O(�B\d\;{ǩBq��:��[��b��?��Ti�hh� +*[v��E�h.#�L�̅f�:�lj��)��D�� S1c��N��S��w��Zީ�~�ԫ ��71�� q�(��F�c�kf��7[��لq�c�h��ݭ��z{�כ��1��ET�z��Z{J�qivW��iS�;5�nl�5��:��} ��K�� Y}��9^��P��N��@��hw0HV[�}�8t ��ކ��| :�<�w�er6��%T�֯�˖��IR��\��P�G��O�U�m�N�-ǜ��ץ��b�OWJ_�Ԃ�I�k��Ik��B�N.�p5 �ġuv��=�B)=�H7��e�<�_T�� ]��`�:��y��XٵU�0�#(��|��b��H�B�~Z럸EJ�v�W�a��u��|͝�EM�M�#cٚ�֎��ot-0X\oF�鳝��S��[gTHjI��܀5ĲEPw��0�#?4�͡�L-3bj�O�,?�l��{er[BӾ��-�\D��٭��A�џq��@�C�Vo�=�mZŮ�X熆�-�h#U�k~��DpYښ�\�%��MJM��vM7EW�BàW\;��Z��r%�µ��.�)FYX�0��"T�Њ��e� ລ��uj�� jJE��\�d��z,��lH��i��[��K#�TLhI��r��Ɠ[�~V��4@�.a�{|�4d��0/�*c�t re{G�t�M��j�R\z|��>ȼ�C)��B�IX>+Z��ؕ]�PU��J�c�Y��Kq��