Migration steps
This page describes the steps to perform and verify a successful migration to YugabyteDB. Before proceeding with migration, ensure that you have completed the following steps:
- Install yb-voyager.
- Check the unsupported features and known issues.
- Review data modeling strategies.
- Prepare the source database.
- Prepare the target database.
Prepare the source database
Prepare your source database by creating a new database user, and provide it with READ access to all the resources which need to be migrated.
Create a database user and provide the user with READ access to all the resources which need to be migrated. Run the following commands in a psql
session:
-
Create a new user named
ybvoyager
.CREATE USER ybvoyager PASSWORD 'password';
-
Switch to the database that you want to migrate.
\c <database_name>
-
Grant the
USAGE
permission to theybvoyager
user on all schemas of the database.SELECT 'GRANT USAGE ON SCHEMA ' || schema_name || ' TO ybvoyager;' FROM information_schema.schemata; \gexec
The above
SELECT
statement generates a list ofGRANT USAGE
statements which are then executed bypsql
because of the\gexec
switch. The\gexec
switch works for PostgreSQL v9.6 and later. For older versions, you'll have to manually execute theGRANT USAGE ON SCHEMA schema_name TO ybvoyager
statement, for each schema in the source PostgreSQL database. -
Grant
SELECT
permission on all the tables and sequences.SELECT 'GRANT SELECT ON ALL TABLES IN SCHEMA ' || schema_name || ' TO ybvoyager;' FROM information_schema.schemata; \gexec SELECT 'GRANT SELECT ON ALL SEQUENCES IN SCHEMA ' || schema_name || ' TO ybvoyager;' FROM information_schema.schemata; \gexec
The
ybvoyager
user can now be used for migration.
Create a database user and provide the user with READ access to all the resources which need to be migrated. Replace 'localhost'
from the following commands with an appropriate hostname in your setup.
-
Create a new user
ybvoyager
.CREATE USER 'ybvoyager'@'<client_IP>' IDENTIFIED WITH mysql_native_password BY 'Password#123';
-
Grant the global
PROCESS
permission.GRANT PROCESS ON *.* TO 'ybvoyager'@'<client_IP>';
-
Grant the
SELECT
,SHOW VIEW
, andTRIGGER
permissions on the source database:GRANT SELECT ON source_db_name.* TO 'ybvoyager'@'<client_IP>'; GRANT SHOW VIEW ON source_db_name.* TO 'ybvoyager'@'<client_IP>'; GRANT TRIGGER ON source_db_name.* TO 'ybvoyager'@'<client_IP>';
Note that if you want to accelerate data export, include the following grants additionally as follows:
For MYSQL
GRANT FLUSH_TABLES ON *.* TO 'ybvoyager'@'<client_IP>'; GRANT REPLICATION CLIENT ON *.* TO 'ybvoyager'@'<client_IP>';
For MYSQL RDS
GRANT FLUSH_TABLES ON *.* TO 'ybvoyager'@'<client_IP>'; GRANT REPLICATION CLIENT ON *.* TO 'ybvoyager'@'<client_IP>'; GRANT LOCK TABLES ON <source_db_name>.* TO 'ybvoyager'@'<client_IP>';
-
If you are running MySQL version 8.0.20 or later, grant the global
SHOW_ROUTINE
permission. For older versions, grant the globalSELECT
permission. These permissions are necessary to dump stored procedure/function definitions.--For MySQL >= 8.0.20 GRANT SHOW_ROUTINE ON *.* TO 'ybvoyager'@'<client_IP>';
--For older versions GRANT SELECT ON *.* TO 'ybvoyager'@'<client_IP>';
The
ybvoyager
user can now be used for migration.
Create a role and a database user, and provide the user with READ access to all the resources which need to be migrated.
-
Create a role that has the privileges as listed in the following table:
Permission Object type in the source schema SELECT
VIEW, SEQUENCE, TABLE PARTITION, TABLE, SYNONYM, MATERIALIZED VIEW EXECUTE
TYPE Change the
<SCHEMA_NAME>
appropriately in the following snippets, and run the following steps as a privileged user.CREATE ROLE <SCHEMA_NAME>_reader_role; BEGIN FOR R IN (SELECT owner, object_name FROM all_objects WHERE owner=UPPER('<SCHEMA_NAME>') and object_type in ('VIEW','SEQUENCE','TABLE PARTITION','SYNONYM','MATERIALIZED VIEW')) LOOP EXECUTE IMMEDIATE 'grant select on '||R.owner||'."'||R.object_name||'" to <SCHEMA_NAME>_reader_role'; END LOOP; END; / BEGIN FOR R IN (SELECT owner, object_name FROM all_objects WHERE owner=UPPER('<SCHEMA_NAME>') and object_type ='TABLE' MINUS SELECT owner, table_name from all_nested_tables where owner = UPPER('<SCHEMA_NAME>')) LOOP EXECUTE IMMEDIATE 'grant select on '||R.owner||'."'||R.object_name||'" to <SCHEMA_NAME>_reader_role'; END LOOP; END; / BEGIN FOR R IN (SELECT owner, object_name FROM all_objects WHERE owner=UPPER('<SCHEMA_NAME>') and object_type = 'TYPE') LOOP EXECUTE IMMEDIATE 'grant execute on '||R.owner||'."'||R.object_name||'" to <SCHEMA_NAME>_reader_role'; END LOOP; END; / GRANT SELECT_CATALOG_ROLE TO <SCHEMA_NAME>_reader_role; GRANT SELECT ANY DICTIONARY TO <SCHEMA_NAME>_reader_role; GRANT SELECT ON SYS.ARGUMENT$ TO <SCHEMA_NAME>_reader_role;
-
Create a user
ybvoyager
and grantCONNECT
and<SCHEMA_NAME>_reader_role
to the user:CREATE USER ybvoyager IDENTIFIED BY password; GRANT CONNECT TO ybvoyager; GRANT <SCHEMA_NAME>_reader_role TO ybvoyager;
If you're using accelerated data export, run the additional grant as follows:
GRANT FLASHBACK ANY TABLE TO ybvoyager;
-
If you're using accelerated data export, the
log_mode
should be archivelog.-
Check the value for
log_mode
using the following command for Oracle/Oracle RDS:SELECT LOG_MODE FROM V$DATABASE;
-
If the
log_mode
value isNOARCHIVELOG
, run the following commands:For Oracle
sqlplus /nolog SQL>alter system set db_recovery_file_dest_size = 10G; SQL>alter system set db_recovery_file_dest = '<oracle_path>/oradata/recovery_area' scope=spfile; SQL> connect / as sysdba SQL> Shutdown immediate SQL> Startup mount SQL> Alter database archivelog; SQL> Alter database open;
For Oracle RDS
exec rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours',24);
-
Verify using archive log list.
-
If you want yb-voyager to connect to the source database over SSL, refer to SSL Connectivity.
Connecting to Oracle instances
You can use only one of the following arguments to connect to your Oracle instance.
Prepare the target database
Prepare your target YugabyteDB cluster by creating a database, and a user for your cluster.
Create the target database
Create the target database in your YugabyteDB cluster. The database name can be the same or different from the source database name. If the target database name is not provided, yb-voyager assumes the target database name to be yugabyte
. If you do not want to import in the default yugabyte
database, specify the name of the target database name using the --target-db-name
argument to the yb-voyager import
commands.
CREATE DATABASE target_db_name;
Create a user
Create a user with SUPERUSER
role.
-
For a local YugabyteDB cluster or YugabyteDB Anywhere, create a user and role with the superuser privileges using the following command:
CREATE USER ybvoyager SUPERUSER PASSWORD 'password';
-
For YugabyteDB Managed, create a user with
yb_superuser
role using the following command:CREATE USER ybvoyager PASSWORD 'password'; GRANT yb_superuser TO ybvoyager;
If you want yb-voyager to connect to the target database over SSL, refer to SSL Connectivity.
Deleting the ybvoyager user
After migration, all the migrated objects (tables, views, and so on) are owned by the ybvoyager
user. You should transfer the ownership of the objects to some other user (for example, yugabyte
) and then delete the ybvoyager
user. Example steps to delete the user are:
REASSIGN OWNED BY ybvoyager TO yugabyte;
DROP OWNED BY ybvoyager;
DROP USER ybvoyager;
Create an export directory
yb-voyager keeps all of its migration state, including exported schema and data, in a local directory called the export directory.
Before starting migration, you should create the export directory on a file system that has enough space to keep the entire source database. Next, you should provide the path of the export directory as a mandatory argument (--export-dir
) to each invocation of the yb-voyager command in an environment variable.
mkdir $HOME/export-dir
export EXPORT_DIR=$HOME/export-dir
The export directory has the following sub-directories and files:
reports
directory contains the generated Schema Analysis Report.schema
directory contains the source database schema translated to PostgreSQL. The schema is partitioned into smaller files by the schema object type such as tables, views, and so on.data
directory contains TSV (Tab Separated Values) files that are passed to the COPY command on the target database.metainfo
andtemp
directories are used by yb-voyager for internal bookkeeping.yb-voyager.log
contains log messages.
Migrate your database to YugabyteDB
Proceed with schema and data migration using the following steps:
Export and analyze schema
To begin, export the schema from the source database. Once exported, analyze the schema and apply any necessary manual changes.
Export schema
The yb-voyager export schema
command extracts the schema from the source database, converts it into PostgreSQL format (if the source database is Oracle or MySQL), and dumps the SQL DDL files in the EXPORT_DIR/schema/*
directories.
Renaming index names for MySQL
YugabyteDB Voyager renames the indexes for MySQL migrations while exporting the schema. MySQL supports two or more indexes to have the same name in the same database, provided they are for different tables. Similarly to PostgreSQL, YugabyteDB does not support duplicate index names in the same schema. To avoid index name conflicts during export schema, yb-voyager prefixes each index name with the associated table name.Usage for source_db_schema
The source_db_schema
argument specifies the schema of the source database.
- For MySQL, currently the
source-db-schema
argument is not applicable. - For PostgreSQL,
source-db-schema
can take one or more schema names separated by comma. - For Oracle,
source-db-schema
can take only one schema name and you can migrate only one schema at a time.
An example invocation of the command is as follows:
# Replace the argument values with those applicable for your migration.
yb-voyager export schema --export-dir <EXPORT_DIR> \
--source-db-type <SOURCE_DB_TYPE> \
--source-db-host <SOURCE_DB_HOST> \
--source-db-user <SOURCE_DB_USER> \
--source-db-password <SOURCE_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters.
--source-db-name <SOURCE_DB_NAME> \
--source-db-schema <SOURCE_DB_SCHEMA> # Not applicable for MySQL
Refer to export schema for details about the arguments.
Analyze schema
The schema exported in the previous step may not yet be suitable for importing into YugabyteDB. Even though YugabyteDB is PostgreSQL compatible, given its distributed nature, you may need to make minor manual changes to the schema.
The yb-voyager analyze-schema
command analyses the PostgreSQL schema dumped in the export schema step, and prepares a report that lists the DDL statements which need manual changes. An example invocation of the command is as follows:
# Replace the argument values with those applicable for your migration.
yb-voyager analyze-schema --export-dir <EXPORT_DIR> --output-format <FORMAT>
The above command generates a report file under the EXPORT_DIR/reports/
directory.
Refer to analyze schema for details about the arguments.
Manually edit the schema
Fix all the issues listed in the generated schema analysis report by manually editing the SQL DDL files from the EXPORT_DIR/schema/*
.
After making the manual changes, re-run the yb-voyager analyze-schema
command. This generates a fresh report using your changes. Repeat these steps until the generated report contains no issues.
To learn more about modelling strategies using YugabyteDB, refer to Data modeling.
Manual schema changes
-
CREATE INDEX CONCURRENTLY
is not currently supported in YugabyteDB. You should remove theCONCURRENTLY
clause before trying to import the schema. -
Include the primary key definition in the
CREATE TABLE
statement. Primary Key cannot be added to a partitioned table using theALTER TABLE
statement.
Export data
Dump the source data into the EXPORT_DIR/data
directory using the yb-voyager export data
command as follows:
# Replace the argument values with those applicable for your migration.
yb-voyager export data --export-dir <EXPORT_DIR> \
--source-db-type <SOURCE_DB_TYPE> \
--source-db-host <SOURCE_DB_HOST> \
--source-db-user <SOURCE_DB_USER> \
--source-db-password <SOURCE_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters.
--source-db-name <SOURCE_DB_NAME> \
--source-db-schema <SOURCE_DB_SCHEMA> # Not applicable for MySQL
Note that the source-db-schema
argument is required for PostgreSQL and Oracle, and is not applicable for MySQL.
Refer to export data for details about the arguments, and export data status to track the status of an export operation.
The options passed to the command are similar to the yb-voyager export schema
command. To export only a subset of the tables, pass a comma-separated list of table names in the --table-list
argument.
Sequence migration considerations
Sequence migration consists of two steps: sequence creation and setting resume value (resume value refers to the NEXTVAL
of a sequence on a source database). A sequence object is generated during export schema and the resume values for sequences are generated during export data. These resume values are then set on the target database just after the data is imported for all tables.
Note that there are some special cases involving sequences such as the following:
- In MySQL, auto-increment column is migrated to YugbayteDB as a normal column with a sequence attached to it.
- For PostgreSQL,
SERIAL
datatype andGENERATED AS IDENTITY
columns use sequence object internally, so resume values for them are also generated during data export.
Accelerate data export for MySQL and Oracle
For MySQL and Oracle, you can optionally speed up data export by setting the environment variable BETA_FAST_DATA_EXPORT=1
when you run export data
using yb-voyager.
Consider the following caveats before using the feature:
- You need to perform additional steps when you prepare the source database.
- Some data types are unsupported. For a detailed list, refer to datatype mappings.
- --parallel-jobs argument will have no effect.
- SSL is unsupported.
- In MySQL RDS, writes are not allowed during the data export process.
- For Oracle where sequences are not attached to a column, resume value generation is unsupported.
Import schema
Import the schema using the yb-voyager import schema
command.
Usage for target_db_schema
Thetarget_db_schema
argument specifies the schema of the target database and is applicable only for MySQL and Oracle.
yb-voyager
imports the source database into the public
schema of the target database. By specifying --target-db-schema
argument during import, you can instruct yb-voyager
to create a non-public schema and use it for the schema/data import.
An example invocation of the command is as follows:
# Replace the argument values with those applicable for your migration.
yb-voyager import schema --export-dir <EXPORT_DIR> \
--target-db-host <TARGET_DB_HOST> \
--target-db-user <TARGET_DB_USER> \
--target-db-password <TARGET_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters..
--target-db-name <TARGET_DB_NAME> \
--target-db-schema <TARGET_DB_SCHEMA> # MySQL and Oracle only
Refer to import schema for details about the arguments.
yb-voyager applies the DDL SQL files located in the $EXPORT_DIR/schema
directory to the target database. If yb-voyager terminates before it imports the entire schema, you can rerun it by adding the --ignore-exist
option.
Importing indexes and triggers
Because the presence of indexes and triggers can slow down the rate at which data is imported, by defaultimport schema
does not import indexes and triggers. You should complete the data import without creating indexes and triggers. Only after data import is complete, create indexes and triggers using the import schema
command with an additional --post-import-data
flag.
Import data
After you have successfully exported the source data and imported the schema in the target database, you can import the data using the yb-voyager import data
command as follows:
# Replace the argument values with those applicable for your migration.
yb-voyager import data --export-dir <EXPORT_DIR> \
--target-db-host <TARGET_DB_HOST> \
--target-db-user <TARGET_DB_USER> \
--target-db-password <TARGET_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters.
--target-db-name <TARGET_DB_NAME> \
--target-db-schema <TARGET_DB_SCHEMA> \ # MySQL and Oracle only.
--parallel-jobs <NUMBER_OF_JOBS>
By default, yb-voyager creates C/2 connections where C is the total number of cores in the cluster. You can change the default number of connections using the --parallel-jobs
argument. If yb-voyager fails to determine the number of cores in the cluster, it defaults to 2 connections per node.
Refer to import data for details about the arguments.
yb-voyager splits the data dump files (from the $EXPORT_DIR/data
directory) into smaller batches. yb-voyager concurrently ingests the batches such that all nodes of the target YugabyteDB cluster are used. This phase is designed to be restartable if yb-voyager terminates while the data import is in progress. After restarting, the data import resumes from its current state.
Importing large datasets
When importing a very large database, run the import data command in a screen
session, so that the import is not terminated when the terminal session stops.
If the yb-voyager import data
command terminates before completing the data ingestion, you can re-run it with the same arguments and the command will resume the data import operation.
Import data status
Run the yb-voyager import data status --export-dir <EXPORT_DIR>
command to get an overall progress of the data import operation.
Import data file
If all your data files are in CSV format and you have already created a schema in your target YugabyteDB, you can use the yb-voyager import data file
command to load the data into the target table directly from the CSV file(s). This command doesn't require performing other migration steps (export and analyze schema, export data, or import schema prior to import. It only requires a table present in the target database to perform the import.
# Replace the argument values with those applicable for your migration.
yb-voyager import data file --export-dir <EXPORT_DIR> \
--target-db-host <TARGET_DB_HOST> \
--target-db-user <TARGET_DB_USER> \
--target-db-password <TARGET_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters.
--target-db-name <TARGET_DB_NAME> \
--target-db-schema <TARGET_DB_SCHEMA> \ # MySQL and Oracle only
–-data-dir </path/to/files/dir/> \
--file-table-map <filename1:table1,filename2:table2> \
--delimiter <DELIMITER> \
–-has-header \
--null-string "<NULL_STRING>"
Refer to import data file for details about the arguments.
Import data file from AWS S3
Import data file also allows you to load directly from your data files stored on AWS S3. The S3 bucket URI can be provided to the data-dir
flag as follows:
yb-voyager import data file .... \
--data-dir s3://voyager-data
Refer to import data file for additional details to set up your S3 bucket with yb-voyager.
Import indexes and triggers
Import indexes and triggers using the import schema
command with an additional --post-import-data
flag as follows:
# Replace the argument values with those applicable for your migration.
yb-voyager import schema --export-dir <EXPORT_DIR> \
--target-db-host <TARGET_DB_HOST> \
--target-db-user <TARGET_DB_USER> \
--target-db-password <TARGET_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters.
--target-db-name <TARGET_DB_NAME> \
--target-db-user <TARGET_DB_USER> \
--target-db-schema <TARGET_DB_SCHEMA> \ # MySQL and Oracle only
--post-import-data
Refer to import schema for details about the arguments.
Verify migration
After the schema and data import is complete, the automated part of the database migration process is considered complete. You should manually run validation queries on both the source and target database to ensure that the data is correctly migrated. A sample query to validate the databases can include checking the row count of each table.
Caveat associated with rows reported by import data status
Suppose you have a scenario where,
- import data or import data file command fails.
- To resolve this issue, you delete some of the rows from the split files.
- After retrying, the import data command completes successfully.
In this scenario, import data status command reports incorrect imported row count; because it doesn't take into account the deleted rows.
For more details, refer to the GitHub issue #360.