Wednesday, October 18, 2017

AWS S3 Transferring Data Across Accounts

  Today I successfully transferred some data on AWS S3 from one account to another. 
  In the process I resolved an encryption related permission issue, which has little information on google given the misleading error message. 
  So I decided to write this down to share with people who need help.

Goal:
Copy data from one S3 bucket to another S3 bucket.

Resources:
 - source account: "src_account"
 - source bucket: "src_bucket"
 - destination account: "dst_account"
 - destination bucket: "dst_bucket"
 - an instance with AWS CLI installed (can be your laptop too)


Steps(high level):
 - created a user on destination account.
 - grant the account with permissions:
   - read from source bucket, using "resource" field & ARN.
   - write to destination bucket.
 - grant this user read access from source bucket, using bucket policy, "principal" field and ARN
 - (if encryption required) grant destination account access to encryption key on source account
 - (if encryption required) grant permission on user to use the key for:
   - decryption. required for reading.
   - encryption. required for writing.


Steps(detailed):

 - create a user on destination account: <sync_user>
  Keep the account key and secret to set up CLI.

 - under IAM on destination account, attach a policy to <sync_user> with these statements:
        {
            "Sid": "AllowReadSource",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::<src_bucket>/*",
                "arn:aws:s3:::<src_bucket>"
            ]
        },
        {
            "Sid": "AllowWriteDestination",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3: PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::<dst_bucket>/*",
                "arn:aws:s3:::<dst_bucket>"
            ]
        }

 - under <src_bucket> permission tab, attach statements to bucket policy:
        {
            "Sid": "AllowReadOnlyOnFileForUser",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<dst_account>:user/<sync_user>"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::<src_bucket>/*"
        },
        {
            "Sid": "AllowReadOnlyOnDirectoryForUser",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<dst_account>:user/<sync_user>"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<src_bucket>/*"
        }

Additional steps if encryption is required for bucket:

 - On source account, add external account to each encryption key used.
  IAM -> Encryption Keys -> Choose the right region -> Add External Account -> <dst_account>

 - On destination account, add another policy with following statements:
        {
            "Sid": "AllowUseOfTheKey",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:GenerateDataKey*"
            ],
            "Resource": [
                "arn:aws:kms:<region>:<src_account>:key/<key_id>"
            ]
        }
 - add another statement to destination bucket in bucket policy:
        {
            "Sid": "Ensure config is encrypted on upload",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::<dst_bucket>/*",
            "Condition": {
                "StringNotLike": {
                    "s3:x-amz-server-side-encryption-aws-kms-key-id": "arn:aws:kms:<region>:<src_account>:key/<key_id>"
                }
            }
        }
CLI:
 - add the created <sync_user> to CLI as a profile:
Note: the region needs to match the region of the KMS key used. (if any)
  aws configure --profile <sync_user>

Command:
If files inside the bucket requires server-side encryption:
  aws s3 cp s3://<src_bucket> s3://<dst_bucket> --recursive --sse aws:kms --sse-kms-key-id arn:aws:kms:<region>:<src_account>:key/<key_id> --profile=<aws-cli-sync-user-profile>

otherwise:
  aws s3 cp s3://<src_bucket> s3://<dst_bucket> --recursive --profile=<aws-cli-sync-user-profile>


Summary:
  This is a quite simple process and some online documents do a better job explaining the steps than I just did. 
  I did attach some of my own understanding for the steps to help you understand why each step is needed. And the permissions I used are the absolutely minimum set of permissions to do such things. You can find templates that works with more permissions, but I feel it's not necessary to grant this user with more permissions than needed.
  I spent a lot of time dealing with the encryption permission issue, since it was nowhere documented, and the error is surfaced as another permission deny. Took me a long time to figure out what was causing the issue. So I really wish this helps if you run into similar issues.

Note: encryption is done on a per-file level, and can be heterogenous within a single bucket. If you run into permission error with "GetObject" action, double check the file that's causing the issue to see if it has encryption enabled.

Tuesday, May 30, 2017

Performance between BoneCP and HikariCP



I've been assessing HikariCP as a replacement for BoneCP for my server in the past week, and the result is somewhat surprising to me.
Sharing it here in case other people were doing the same thing.

The short conclusion is: BoneCP is slightly faster than HikariCP.

Test environment:
 - BoneCP version: 0.8.0-RELEASE
 - HikariCP version: 2.6.1
 - Tested on 2 groups of servers located on Amazon AWS, with DB servers in the same availability zone.
 - The only difference in the version of jar deployed is the connection pooling library difference (as well as the configuration difference)

The variation in configuration does not seem to make a big difference.

Tested by using the recommended configuration from each library, and for
1st test: trying to map the configuration by meaning
2nd test: match number of connections per host

Both tests give me the same result: the measured wall time for HikariCP is constantly 1~2ms slower than using BoneCP.

This is tested on a live product which has >100K concurrent users all the time and the range of tested queries has covered a few benchmark tests.

For faster database queries, this can be pretty significant: running an indexed select queries from a table costs 1ms on avg for BoneCP group but 2ms for HikariCP

Similarly this affects other queries including inserts & updates & deletes. Due to the range of the queries I have, the difference is 1~2ms.

After reading through this:
https://github.com/brettwooldridge/HikariCP/wiki/Pool-Analysis

I started to wonder if it's the validation overhead that's causing the performance difference.

And the awesome developer for HikariCP told me there are ways to configure that:
https://github.com/brettwooldridge/HikariCP/issues/900

So I did a 3rd & 4th test by:
- increase the window of validation check from 500ms to 5s
- override the test connection query from using jdbc isValid method to a simple "SELECT 1"

Unfortunately the result is the same, and the difference is roughly the same too.
So the connection test is not the culprit for performance different, at least on my environment.

Although I think the validation check is good and should be there, I've stopped at this point because I know BoneCP would probably be my go-to option given the performance result.

For now I'm unable to explain the performance different, and that's what needs to be updated. I'll dig the source code a bit further when I have some more time to spend on this.